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INTRODUCTION 


The Intel 80386 microprocessor was probably the most widely discussed central 
processing unit (CPU) chip since the introduction of the 8080 in the early days of 
personal computing. The first edition of this book explored the capabilities of the 
80386. Since then, Intel Corporation has introduced three additional processors with 
the same basic architecture. The 80386 family of processors now includes the origi- 
nal 80386, the 80386SX, the 80376, and the newest and fastest member of the fam- 

ily, the 80486. I have expanded the book to describe the differences among the 
processors. : 


Chapter 1 presents a history of the x86 microprocessor family. Each subsequent chap- 
ter discusses a portion of the 80386/80486 processor architecture. The organization 
of the CPU is presented in Chapter 2. The basic memory architecture is discussed in 
Chapter 3. Chapter 4 introduces the basic instruction set and the floating-point 
instruction set. Chapter 5 explains protected-mode operation. Chapter 6 tells how 
paging extends the memory system and how the cache works in the 80486. Com- 
patibility with previous processors via real mode, virtual 8086 mode, and protected 
mode for the 80286 is covered in Chapter 7. Finally, Chapter 8 provides a full instruc- 
tion set reference. 


This book focuses entirely on programming. It does not discuss the hardware fea- 
tures of the processor unless those features relate to specific instructions. If you are 
interested in the hardware characteristics of any of these processors, you can obtain 
the appropriate data sheets and reference manuals from Intel. 


To get the most from this book, you should be familiar with computer systems. In 
particular, an understanding of binary and hexadecimal arithmetic and machine- 
language programming for some other processor(s) will be helpful. 


A large portion of the book is devoted to protected mode. Although you do not 
need to understand this feature to program applications, it is important to under- 
stand protected mode to grasp why system designers made the choices they did 
in implementing the OS/2, Microsoft Windows, PC-MOS/386, and UNIX oper- 
ating environments. | | 


xi 
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The conventions used throughout this book are summarized on the following pages. 
If you are familiar with other Intel microprocessors, you are probably already famil- 
iar with these concepts. 


Number Formats 


I use numbers in three bases: binary (base 2), decimal (base 10), and hexadecimal 
(base 16). You can assume that all numbers are base 10 unless they are followed by 
the suffix “B” (for binary) or ‘““H” (for hexadecimal). For example: 


1AH = 26 = 00011010B 


Data Types 


The most commonly used data types are 8-bit, 16-bit, and 32-bit quantities. In this 
book, an 8-bit quantity is called a byte, a 16-bit quantity is called a word, and a 32- 
bit quantity is called a doubleword, or dword. This nomenclature is unusual because 
the standard data item size of a computer is commonly called a word. In the Digital 
Equipment VAX computers, for example, a 32-bit quantity is a word, and a 16-bit 
quantity is a halfword. The same is true for the Motorola 68000 family and the IBM 
370 and 390 mainframes. 


Although the standard 80386/80486 operand size is 32 bits, Intel retained the nam- 
ing conventions of its earlier processors because the 32-bit processors are de- 
scendants of the 8086 and the 80286, which were 16-bit processors. This simplifies 
running software from the 8086 or the 80286 and lets you use the same assembler to 
generate code for any of the four processors. 


The smallest addressable data item in the x86 family is the byte. All other data items 
can be broken down into bytes. The processor stores larger data items in memory 
low-order byte first, as the following diagram shows: 


Bits 7 0 
1 byte 
Bits 7 0 15 8 
low byte ' high byte 
16-bit word 
Bits 7 015 8 23 16 31 24 
low byte | 3 | high byte 


32-bit dword 


Assume that the 32-bit value 100F755DH is stored in memory, beginning at location 
10. The individual memory bytes are: 


Address 10 11 "12 13 
Contents 5DH |. 75H OFH 10H 


xii 
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It is unnecessarily complex, however, to show words and doublewords broken down 
in byte order, and illustrations in this book treat the quantity as a unit. For example, 
the book would present the previous value as: 


31 0 


100F755DH ! 


When performing operations on items smaller than a single byte—for example, on 
a single bit or bit field—the processor always fetches at least 1 byte from memory. 


Assembler Notation 


An executable instruction is a binary pattern that is decoded by the logic inside the 
CPU. An instruction can be from 8 to 128 bits in length. Because coding a program 
using binary patterns would be tedious, programmers use a type of program called 
an assembler. The simplest type of assembler takes a set of keywords and symbols 
and translates them into an instruction. The set of keywords and symbols is called 
the assembly language. Typically, there is a one-to-one mapping between an in- 
struction in assembly language and an actual machine instruction. The assembler 
would take an instruction such as: | 


ADD EBX, 5 


meaning, “Add 5 to the value in register EBX and store the result in EBX,” and 
would translate it into the bit pattern: 


1000000011000011000001018B 


The names of the instructions, called mnemonics, usually occupy the first field in 
an instruction line. The subsequent fields are the operands of the instruction and 
can take a number of forms. The simplest is a numeric value, such as the 5 in the 
example above. A register name is another form of operand. An expression within 
brackets, such as [EBP+2], signifies an operand that is a memory reference. 


Throughout the book, I use standard Intel mnemonics. Notice, however, that a 
mnemonic does not necessarily specify the exact encoding of an instruction. For 
example, the “increment” instruction has a general form in which any operand may 
be encoded, and the instruction INC EAX would be encoded as FFH OOH. A single- 
byte instruction also exists for incrementing a general register. In this form, the INC 
EAX instruction is encoded 40H. An assembler will generally choose the most com- 
pact form of instruction for any given mnemonic, but the effect of executing either 
form is the same. 


I also use a common convention in discussions about setting bits. I use the term 
“set” when assigning the value 1 to a bit, and the term “reset” when assigning the 
value 0 to a bit. 
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Syntax 


This book uses the following syntax: 


Operator Meaning Operator Meaning 

+ Addition & Boolean AND 

_ Subtraction > Greater than 

x Multiplication < Less than 

/ Division >> Shift right 

~ Not << Shift left 

= Equal to < Less than or equal to 

!= Not equal to 2 Greater than or equal to 
i Or <- Assignment 

A Exclusive OR 


32-Bit Instruction Set 


The 80386, 80386SX, and 80486 support several modes that are compatible with 
previous Intel processors (the 16-bit 8086 and 80286). However, this book focuses 
on new features and does not discuss the 16-bit architectures of the 8086 and the 
80286, even though they are a subset of the 80386/80486 processors capabilities. 
Programmers using either the 80386 or 80486 as a replacement for previous pro- 
cessors can simply continue to use reference materials for the 8086 or the 80286. 


Operating System Services 


The 80386 family architecture is quite complex, and it is not reasonable to expect a 
stand-alone program to take advantage of all the CPU’s capabilities. At various times, 
I make statements such as “The operating system will...” or “At this point, the oper- 
ating system....” In these cases I am not referring to any particular operating sys- 
tem; instead, I am highlighting a feature that will be implemented by the operating 
system software and not by an application. 


xiv 
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EVOLUTION 
OF THE 
S0Ox86-FAMILY 
ARCHITECTURE 


Although I have spent more than a decade working with microcomputers, the 
phrase “computer system” still brings to mind images of the installation in the base- 
ment of the campus library at Montana State University. There, in air-conditioned 
comfort, behind glass walls, lived Siggie, the university computer system (a Xerox 
Sigma 7). Housed in several refrigerator-size units, Siggie served the computing 
needs of the entire university. 


By 1986, the 80386 microprocessor, born of a technology that was first realized 
while Siggie was still considered state-of-the-art, could serve as the heart of a desk- 
top microcomputer that had greater computing power than Siggie. And now the 
even faster 80486 is merely one more member of a processor family that Intel claims 
will be continuously improved through the year 2000. 


The First Components 


The 80486 is the latest member of a line of microprocessors built by Intel Corpora- 
tion. Intel claims to have invented the microprocessor in 1971, as a result of having 
been approached by a (now defunct) Japanese corporation to build a custom circuit 
to serve as the “brains” for a new calculator. Intel designer Ted Hoff proposed that a 
programmable, general-purpose computing circuit be built instead, and the 4004 
chip became a reality. The 4040 and 8008 chips soon followed; however, these 
chips lacked many characteristics of microprocessors as we know them today. 


MICROSOFT’S 80386/80486 PROGRAMMING GUIDE 


The 8080 


The chip that, by most accounts, led to the birth of the microcomputer industry 
was the 8080, which Intel introduced in 1974. An article in the September 1975 issue 
of Popular Electronics brought the idea of a “personal” computer to the mass mar- 
ket, and, as they say, the rest is history. The 8080 was the CPU (central processing 
unit) in such pioneering systems as the Altair and the IMSAI. Intel did not enjoy a 
monopoly on the market for long, however; Motorola Corporation introduced the 
6800, MOS Technology responded with the 6502, and two designers of the 8080 left 
Intel for Zilog Corporation, which soon produced the Z80. Unlike the 6800 and the 
6502, whose architectures were completely different from those of Intel processors, 
the Z80 was compatible with the 8080 but had an expanded instruction set and ran 
twice as fast. The battle for CPU supremacy was on. 


The 8080 was an 8-bit machine—that is, it processed data 8 bits at a time. It had a 
single accumulator (the A register) and six secondary registers (B, C, D, E, H, and L, 
shown in Figure 1-1). These six registers could be used in 8-bit arithmetic operations 
or combined as pairs (BC, HL) to hold 16-bit memory addresses. A 16-bit address 
allowed the 8080 to access 21° bits, or 64 kilobytes (KB), of memory. 


Intel also developed a refinement of the 8080 called the 8085, an 8080-compatible 
processor that featured better performance and a simpler hardware interface. 


Figure 1-1. The 8080 register set. 


The 8086 


In 1978, under pressure from other manufacturers’ faster, more powerful micropro- 
cessors, Intel moved to a 16-bit architecture. The 8086 was touted as the successor 
to the 8080 microprocessor, and, although the instruction set was new, it retained 
compatibility with the 8080’s instruction set. Figure 1-2 shows how the new registers 
of the 8086 could be mapped into the set of 8080 registers. 


Programs that were written for the 8080 could not be run on the 8086; however, 
almost every 8086 instruction corresponded to an 8080 instruction. At worst, an 
8080 instruction could be simulated by two or three 8086 operations. An Intel 

translator program could convert 8080 assembler programs into 8086 assembler 
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programs, and the first versions of Microsoft Corporation’s BASIC and MicroPro In- 
ternational Corporation’s WordStar for the 8086 were ported from 8080 systems via 
the Intel translator. This concern for compatibility has characterized Intel’s presence 
in the microcomputer market. Every new generation of microprocessor has been 
able to run software written for the previous generation. 


8086 8080 


UAW > 
~ Kx mK 


Figure 1-2. The SOSO-8086 register set map. 


In addition to providing software compatibility, Intel was interested in supporting 
high-level languages. At Intel, almost all programming was done in an Algol-like 
language called PL/M. Intel believed that a language such as PL/M or Pascal would 
become the dominant microcomputer development language, so Intel dedicated 
many 8086 registers to specific purposes, as shown in Figure 1-3. 


AX Accumulator 
BX BH Base pointer 
CX Count register 
DX Data register 


IP 


Destination index r egister 
Source index r egister 
Stack frame base pointer 
Stack pointer 

Instruction pointer 


Code segment 
Data segment 
Stack segment 
Extra segment 


Figure 1-3. The 8086 register set. 
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The next two examples show dedicated registers in use. Figure 1-4 shows how high- 
level languages such as Pascal use the stack pointer (SP) and base pointer (BP) 
registers. 


Pascal code Stack frame Variable Addressing mode 


procedure procl (a, b: int) 
int i; 

real j; 

begin 


[BP — offset] 


(parameters) 


Old IP 
Old BP , 


(locals) 


end 
BP 


[BP + offset] 
SP 


Figure 1-4. Subroutine context. 


In a Pascal program, the context of the currently executing subroutine is maintained 
on the stack. The values (parameters) provided to the subroutine by the calling rou- 
tine are first on the stack, the saved IP of the calling routine are second, and the 
saved BP of the calling routine are third. The context also contains stack space for 
any temporary or local variables that the subroutine uses. Access to either the pa- 
rameters or local variables is relative to the current value of BP. 


Consider the Pascal assignment statement in Figure 1-5. Because an entire record 
must be copied, the compiler generates a block move instruction that uses the SI, 
DI, and CX registers. 


The advantage of dedicated registers is that it allowed Intel to encode the instruc- 
tions in a compact, memory-efficient manner. The opcode specifies exactly what is 
to take place; for example, in the MOVSB instruction, specifying the three operands 
(source, destination, and count) is unnecessary. As a result, the MOVSB opcode is 
only 1 byte. The disadvantage of dedicated registers is that if you are using SI or DI 
and want to do a MOVSB instruction, you can’t use another register. 


The 8086 also introduced segmentation to the microprocessor world. A segment is a 
block of memory beginning at a fixed address that is determined by the value in the, 
appropriate segment register. This concept, probably the most despised feature of 
the 8086 because of the restrictions it imposes, was incorporated for compatibility 
with the 8080; each segment was 64 KB, equivalent to one 8080 address space. 

Using segmentation, software can maintain the 16-bit addressing used in the 8080 
while expanding (through the use of multiple segments) the memory that the chip 


\ 
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Pascal code Assembly code 


var 
1, jJ : employee_rec; 
begin lea di, j 
; lea si, j 
mov cx, SIZEOF(rec) 
rep movsb 


Source Destination 


Figure 1-5. Block move. 


can address. The 8086 provides four segment registers that can point anywhere in 
the 1-megabyte (MB) address space. They are defined as follows: 


CS—The code segment register: All calls and jumps refer to locations within the 
code segment. 


DS—The data segment register: Most memory-reference instructions refer to an 
offset within the data segment. 


SS—The stack segment register: Al\| PUSH and POP instructions access data in 
the stack segment. Additionally, any memory reference done relative to the BP 
register is also directed to the stack segment. 


ES—The extra segment register: This segment specifies the destination seg- 
ment in certain string processing instructions. 


The way an application manages memory (the memory model) is usually consistent. 
throughout a program. When Intel introduced the 8086, three memory models were 
postulated, which are shown in Figure 1-6 on the following page. 


The tiny model mimicked the 8080 address space. The code segment and data seg- 
ment were in the same area of memory, and the program was limited to 64 KB. The 
small model was expected to be prevalent because it allowed programs to double in 
size. By having separate code and data segments, programs could expand to 128 KB 
and still retain 16-bit addressing. The large memory model allowed the use of mul- 
tiple code and data segments. In this model, the entire 1-MB address space of the 
processor could be used. 
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Tiny Small Large 


Figure 1-6. Memory models. 


When the 8086 was introduced in 1978, most microcomputers were limited to 64 
KB; almost no one realized how quickly the 64-KB segment limitation would 
become a serious problem. Although the large model allowed programs to fill the 
entire 1 MB of 8086 address space, using the large model meant using 32-bit point- 
ers. On a 16-bit machine, 32-bit pointers exacted a size and performance penalty 
that most programmers were unwilling to pay. By the early 1980s, even the 1-MB 
limitation became confining. Additional memory models with names such as “com- 
pact” and “medium” were introduced to optimize performance for special program- 
ming needs. 


Other processors in the 8086 family were the 8088, the 80186, and the 80188. The 
8088, introduced a year after the 8086, had the same 16-bit internal architecture but 
a restricted 8-bit external bus. The 8088 could run the same programs as the 8086 
but typically ran them 30 percent slower. The 8088 became wildly successful when 
IBM chose it for the PC and the PC/XT. The 80186 and 80188 were announced much 
later, in 1982. These processors kept the same base architecture but included fea- 
tures such as direct memory access (DMA) controllers, on-chip counter/timers, and 
a simplified hardware interface. They also operated more quickly than did the 
8086/8088 and became popular in controller applications. 


The 8087 


An innovative part of the 8086 family of CPUs is the coprocessor. The ESC or 
coprocessor escape class of instructions generated only a memory address on the 
8086. Additional, special-purpose CPUs could be created to monitor the instruction 
stream and watch for ESC sequences, as shown in Figure 1-7. Whenever an ESC was 
detected, the coprocessor could decode the escape as an instruction for itself and 
perform a function that the 8086 was incapable of doing efficiently on its own. 
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Instruction path 


ESC 7 = FMUL ST(2 ~ || 
Pe aan ey, a es 
8086 8087 


Figure 1-7. 8086 coprocessor interface. 


The only coprocessor developed for the 8086 was the 8087. The 8087 implemented 
a floating-point instruction set, capable of as much as 80 bits of precision. Intel 
worked closely with the Institute of Electrical and Electronics Engineers (EEE) and 
professors at the University of California, Berkeley, to create a floating-point repre- 
sentation that was flexible and accurate. This representation and its numeric prop- 
erties have since been formalized as Standard IEEE-754. 


The 8087 contributed to the popularity of the 8086. A desktop computer that con- 
tained both an 8086 and an 8087 could do more substantial scientific work than the 
8086 alone. Implementing floating-point functions in hardware improved the per- 
formance of mathematical calculations over existing software routines. However, 
the 8087 exemplified the problems of the 64-KB segment size. As soon as scientists 
and engineers had the computing power to handle real-world problems, they often 
needed to deal with large arrays of numbers. The 64-KB segment limit restricted a 
vector of double-precision floating-point numbers to no more than 1024 elements. 
Software capable of getting around the restriction was soon available, but the large 
memory model was difficult to program in and was slow. 


The 80286 


The next major introduction from Intel, the 80286, came in 1982. The 80286 is com- 
patible with the 8086 family, but it also provides a significant performance improve- 
ment. It boasts two operating modes: real mode and protected mode. Real mode, 
which emulates the 8086, is the default mode. The new mode is called protected 
mode. In protected mode, the 80286 supports the 8086 instruction set but places a 
new interpretation on the contents of the segment registers that control how 
memory is accessed. 


Although operating systems that are implemented under protected mode are differ- 
ent from those that are designed for real mode, applications can be developed that 
run in either mode. The design of these dual-mode applications requires that the 
application observe certain memory restrictions. 
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Unfortunately, MS-DOS, which is the dominant operating system for 8086-based 
machines, places no restrictions on how an application addresses memory, and pro- 
tected mode proved incompatible with a majority of MS-DOS applications. As a 
result, for a number of years the 80286 was generally treated as a fast 8086 because 
no one knew how to use the beneficial new feature— protected mode. This was 
unfortunate because protected mode expands the amount of physically addressable 
memory from 1 MB to 16 MB, allows the implementation of virtual memory, and 
provides for the separation of tasks in a multitasking or multiuser environment. 


Versions of UNIX run in protected mode, but UNIX has not been successful on the 
80286 because competitive products usually run on more powerful 32-bit com- 
puters. Subsequently, Microsoft introduced OS/2, which uses almost all protected- 
mode features, and more recently introduced Windows 3, which also runs applica- 
tions in protected mode. 


The 80286 is the first Intel microprocessor designed for “serious” computing. Provi- 
sions were made for multitasking, data integrity, and security. The designers ex- 
amined the architecture of minicomputers and mainframes as they developed the 
80286. In addition, two of the main influences on the 80286 designers were the 
Multics project and a continued belief that Pascal would become the preeminent 
application-development language. 


Reading the conference papers about the Multics project will enlighten anyone who 
thinks that protected mode is the product of some Intel designer’s fevered imagina- 
tion. Multics began in the mid-1960s as a joint research project among MIT, Bell 
Laboratories, and General Electric. The project combined hardware and software 
and was based on the GE 645 mainframe. The following is a partial list of architec- 
tural features that the Multics group “pioneered”: 


m@ Virtual memory” 

@ Protection rings 

m@ Segmented addressing” 

m@ Descriptor access rights 

m@ Call gates 

™@ Conforming code segments 


Some features of Multics also made their way into existing 80286-based software 
systems. Microsoft’s OS/2, for example, uses dynamic linking, another Multics 
innovation. 


The influence of Pascal on the design of the 80286 is shown by the addition of the 
ENTER instruction to the 80286 instruction set. The ENTER instruction simplifies 


” The Multics group did not invent these features, but it made them an integral part of the system. 
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creating a stack frame such as the one shown in the subroutine context illustration 
in Figure 1-4. ENTER can also copy the context or stack frame of the previous sub- 
routine. This ability is not necessary in languages such as FORTRAN or C but is use- 
ful in languages such as Pascal and Ada that allow nested procedure declarations. 


The 80287 


Intel also introduced a new coprocessor for the 80286, but the 80287 was a bit of a 
disappointment. Although the 80286 executes programs two to three times faster 
than the 8086, the performance of the 80287 is about the same as that of the 8087. 
Intel did not really modify the computational engine of the 8087 in creating the 
80287, so the new coprocessor does not run any faster. Intel did change the inter- 
face between the CPU and the coprocessor, however, eliminating the need for the 
coprocessor to monitor the instruction stream of the main CPU. 


In this new interface method, illustrated in Figure 1-8, the main CPU decodes the 
ESC instructions and then passes the information to the coprocessor via the I/O 
channel. Because addressing is treated differently in real mode than it is in pro- 
tected mode, the coprocessor would have had to operate in different modes as well, 
using the old interface method. Instead, the new interface requires the 80286 to vali- 
date all addresses before signaling the 80287. This interface allows the coprocessor 
to run at a clock rate different from that of the main CPU, and it also allows the 
80287 to be used with CPUs other than the 80286. 


Instruction path 


Daas iy, ee eae 


‘nau (STEREO) ser 


Figure 1-8. 80286 coprocessor interface. 


Competitive Pressures 


Between the introduction of the 8086 and the 80286, Motorola developed what 
became the strongest competition to Intel’s dominance of the microprocessor 
market, the 68000 family. Several features of the Motorola microprocessors were at- 
tractive to the development community. The 68000 family incorporates a 32-bit in- 
ternal register file for data and addressing. This allows a large application address 
space without the limitation of 64-KB segments. This 32-bit capability also makes it 
easy to port operating systems (such as UNIX) and minicomputer applications to 
the 68000-family processors. 
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Motorola also boasted about the “orthogonality” of the 68000 instruction set. 

Unlike the 8086 and the 80286, with their special-purpose registers, the 68000 
allowed programmers to specify any register for a given instruction. Although all 
68000 microprocessors had 32-bit register files, the first two CPUs (68000 and 68010) 
were limited to 24-bit addresses and a 16-bit memory interface. In 1985, however, 
Motorola began sampling the 68020, which had a full 32-bit address bus and a 32-bit 
data bus. Although Intel had most of the business microcomputer market, makers of 
scientific and engineering workstations almost unanimously chose Motorola CPUs 
for their products. 


Intel’s 32-Bit Microprocessor 


Intel’s design engineers faced two problems: compatibility and performance. They 
needed to maintain compatibility with the previous generation of processors to re- 
tain their share of the PC business market; Intel’s marketing force frequently referred 
to the “billions and billions” of bytes of code (applications) that the 80386 had to 
be able to run. At the same time, they needed a product that would address the 
shortcomings of the 8086-family architecture, which gave Motorola an edge in 
scientific and engineering markets. The resulting product, the 80386, addressed 
these issues by operating in a number of modes. At boot time, it operates in real 
mode like the 80286 and is nothing more than a very fast 8086. It uses 16-bit regis- 
ters and the 8086 segmentation scheme, and it is subject to the 1-MB memory 
limitation. 


But the 80386 can also be switched to protected mode. In protected mode, each 
segment is marked by a bit that designates whether the segment is a protected- 
mode segment containing 16-bit 80286 code or a 32-bit protected-mode segment. 
Programs residing in 32-bit segments can use the extended address space (segments 
larger than 64 KB) and additional features, including array indexing, orthogonal use 
of the register set, and special debugging capabilities not found in previous 
processors. 


A protected-mode operating system can also create a task that runs in virtual 8086 
mode. An application running in this mode believes that it is running in real mode 
or on an 8086. However, the operating system can designate certain classes of in- 
put/output (J/O) operations that it will not allow. If the application attempts to vio- 
late any operating system rules, an interrupt is generated that transfers control from 
the application to the operating system. By examining the instruction that the appli- 
cation was trying to execute, the operating system can choose to block the applica- 
tion from running, simulate the operation, or ignore it and let the application 
continue. The operating system also maps the 1-MB 8086 address space that the ap- 
plication believes it is running under to the actual memory space that the operating 
system wants the application to use. A protected-mode operating system can estab- 
lish multiple virtual 8086 tasks. 


10 
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The 80386 also extends the similarities between the Intel architecture and the 
Multics system. Like Multics, the 80386 integrates the ability to perform demand 
paging (a virtual-memory technique used in minicomputers and mainframes) with 
segmentation. 


Intel also continued a tradition it began with the 8088: It offered a low-cost version 
of the processor. The 80386SxX is identical internally to the 80386. However, it has 

only a 16-bit external data bus and a 24-bit address bus, and it is generally available 
at slower clock speeds than the full 32-bit version (sometimes called the 80386DX). 


Another variant on the 80386 is the 80376. This chip is identical to the 80386SX 
except that it operates only in 32-bit protected mode and does not support paging. 
It cannot run real-mode programs and has no virtual 8086 mode capability. 

It is designed for embedded process control applications. 


The 80387 


T 


The 80386 microprocessor line from Intel also boasts new coprocessors, the 80387 
and the 80387SX. The interface between the 80386 CPU and the coprocessor is the 
same as that defined for the 80286 and the 80287. The 80386 can also be coupled 
with the slower 80287 to provide a lower-cost floating-point environment. If the sys- 
tem board has the appropriate socket, the 80387 provides a significant performance 
improvement over its predecessor, executing floating-point benchmarks about five 
times faster. 


he 80486 


In 1989, the newest kid on the block was the 80486. Its basic architecture is identical 
to that of the 80386, but the following advances are part of its design: single-clock 
execution for the most basic instructions, an 8-KB cache to speed access to fre- 
quently referenced memory locations, and an on-board numeric coprocessor. Be- 
cause all the floating-point logic has been incorporated directly into the 80486, an 
80487 will never be needed. Additionally, the chip was redesigned to make it easier 
to build computers with multiple 80486 CPUs. 


Intel has indicated that the 80x86 product line will continue to evolve. The next- 
generation processor will be called the 80586 and will include capabilities beyond 
those of the 80486. However, Intel has committed to broadening the microprocessor 
line as well as lengthening it. The CPUs are also available in a wide range of clock 
speeds, from 16 through 33 megahertz, with even faster models promised for the 
future. | 
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Summary 


The first processor of the line to feature 32-bit computing was the 80386, so I will 
refer to the 80386, 80386SX, 80376, and 80486 as the “80386 family.” As you can see 
from the following table, the technology has advanced significantly beyond that of 
its predecessors; however, the road to 32-bit computing was not necessarily straight 
and narrow. Processor design has been shaped by a number of forces: the ideals of 
the designers, the limits of compatibility Gome stemming from the early days of the 
8080), threats from the competition (both real and perceived), and other factors 
such as Pascal, Multics, and UNIX. Now that I’ve shown the origins of the 80386 
family, the remainder of the book will show how it works. 


Relative Performance 


8086/87 80286/287 80386/387 80486 
Integer 1.0 Zo] 9.0 20.0 
Floating point 1.0. 1.7 10.0 40.0 


For example, the 80486 is approximately 20 times faster than the 8086/87 perform- 
ing integer calculations and approximately 40 times faster performing floating-point 
calculations. (Measurements refer to the clock rate of the chip when first introduced. 
Faster versions of all the processors have subsequently been made available.) — 
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THE CPU 
ARCHITECTURE 


Back in 1837, when Charles Babbage was musing over the idea of computation 
automata, he referred to his grandest scheme as an “analytical engine.” At that time, 
especially considering the mechanical aspects of Babbage’s idea, an engine was an 
apt metaphor for a computing device: fuel, combustion, and power became input, 
computation, and output. 


A Data-Processing Factory 


In recent years, the machinelike cycle led to limitations on the amount of work that 
could be accomplished. A modern microprocessor is more analogous to a factory 
than to an engine. At the heart of this data-processing factory, the computational 
engine remains, but it is surrounded by a bevy of supporting departments. 


Figure 2-1 on the following page illustrates our imaginary widget factory. It is com- 
posed of three departments: Shipping and Receiving, Materials, and Manufacturing. 
The Shipping and Receiving department deals with the world outside the factory. It 
orders truckloads of raw materials from suppliers and passes them to the Materials 
department. The goods are sorted here and warehoused until needed. The Manufac- 
turing department, the “engine” of the factory, forges the finished widgets from the 
raw materials and routes them to Shipping and Receiving, where they are sent to the 
outside world. 


The efficiency of this model lies in the parallel nature of the different activities. At 
the same time as the Materials department requests the raw goods necessary to 
build widgets, Manufacturing builds the current supply of widgets, and Shipping and 
Receiving deals with the outside world, buys unfinished goods, and ships the newly 
finished widgets. 


Conventional microprocessors, or CPUs, receive two classes of data: instructions 
and operands. The instructions tell the computer which operations to perform on 
the operands. 
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Manufacturing Shipping 


& 
Receiving 


Widgets 


Raw materials 
Materials 


Figure 2-1. Widget factory. 


Like our imaginary factory, the 80386 and 80486 can work on more than one instruc- 
tion simultaneously. In the jargon of the computer industry, this is called pipelining. 


In Figure 2-2, I recast the widget factory as a data-processing factory analogous to 
the operation of a microprocessor. The Shipping and Receiving department pulls 
in bytes of data from memory. Instructions then move to the Materials department, 
where they are decoded and stored. When requested, the new instructions and 
any necessary operands pass to the Manufacturing department, the computational 
engine. The results of an operation pass back to Shipping and Receiving, which 
stores the results outside the CPU, in memory. 


Manufacturing 


Operations 


Shipping 
& 


Instructions Receiving 


~~ DOs 


Materials 


Figure 2-2. Data-processing factory. 


Although simple, this picture of the flow of information through the processor is 
fairly accurate. The three departments in the example correspond to six logical units 
in the 80386, as shown in Figure 2-3. The 80486 is somewhat more complex, adding 
an additional execution unit for floating-point operation and a cache unit that sits be- 
tween the rest of the processor and main memory. Each unit operates in parallel with 
the other units. Later sections of this chapter describe the operation of each unit. 
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Shipping & Receiving 


Segmentation 
unit 


Manufacturing 


Execution unit 


Results M 

E 

Paging M 

unit 0 

Materials - 

Instruction Bus Raw data Y 
interface 


prefetch 
unit 


Decode 
unit 


unit 


Figure 2-3. 80386 factory. 


Keeping the factory moving 


The heartbeat of a microprocessor is the clock signal. This regular electronic pulse 
keeps all units of the processor synchronized. The clock signal is a square wave os- 
cillating at a specific frequency, as shown in Figure 2-4. Instruction timings, mem- 
ory access times, and operational delays are measured in terms of clocks, or one 
complete square-wave cycle. The 80386SX is available in versions that run at either 
16 or 20 megahertz (MHz). The DX or standard 80386 is available in models that run 
at a variety of speeds, from 16 through 33 MHz. The 80486 is available in 25-MHz or 
33-MHz versions. The figure below shows a system running with a 25-MHz clock. 
At 25 MHz, each cycle lasts 40 nanoseconds. 


1 clock 


| 40 nsec | 


25 MHz 


Clock* 


* Actual hardware signal on the 80386 only is two-phase, 
that is, it oscillates twice for every processor clock. 


Figure 2-4. A square-wave cycle. 


You can compute the time it takes a single instruction to execute using the tables 
provided in Appendix D. Figure the time for a single cycle and multiply it by the 
clock count given for the instruction. You figure the cycle time by dividing the © 
clock speed (in MHz) into 1000. For example, the cycle time for a 16-MHz 80386 is 
1000/16, or 62.5 nanoseconds. Notice that in the 80386 (SX and DX), the actual hard- 
ware Clock device oscillates at twice the chip’s clock frequency; this is called a two- 
phase clock. The 80486, however, does not use a two-phase clock. 
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Performance advantages of parallelism 


The pipelined operation of the 80386 and the 80486 “hides” portions of instruction 
execution time. Some operations necessary to execute an instruction occur during 
the previous instruction. The table that follows illustrates the difference between 
executing a typical instruction (ADD ECX, [EBP+8]) on the 80386 and executing it 
on a similar but imaginary processor without pipelining. 


Operation With Pipelining Without Pipelining 
Instruction fetch 0 clocks 2—4 clocks 
Instruction decode 0 clocks 1 clock 

Operand address xlate 0-6 clocks 2-8 clocks 

Operand read 3 clocks 3 clocks 

Execute 2 clocks 2 clocks 

Total: 5-11 clocks 10-18 clocks 


Pipelining lets the 80386 execute an instruction about twice as quickly as a similar 
processor that performs each step of the instruction sequentially. Some instructions 
that have no operands appear to execute in “zero” time because of the parallel na- 
ture of 80386 operating units. The 80486 has an even greater advantage. First of all, 
the basic processor is faster. The execute time for many instructions on the 80486 is 
a single clock, and the operand read time is only 2 clocks. In addition, the 80486 
contains an on-chip cache that holds 8 KB of the most frequently referenced infor- 
mation. If the operand address references a value that is stored in the cache, the 
operand read time is 0, meaning that the entire instruction could execute in as little 
as 1 clock cycle. 


CPU Microarchitecture 


Figure 2-5 shows a block diagram of the internal operating units of the 80386. 
Although the programmer sees the 80386 as a single entity, it is instructive to see 
how the 80386 achieves the division of labor that contributes to its speed. 


Bus interface unit (BIU) 


The bus interface unit (BIU) is the 80386’s gateway to the external world. Any other 
unit that needs data from the outside asks the BIU to perform the operation. Simi- 
larly, when an instruction needs to write data to memory or to the I/O channel, the 
BIU is presented with the data and address and is asked to place it on the bus. The 
BIU deals with physical (hardware) addresses only, so operand addresses must first 
pass through the segmentation unit and the paging unit, if necessary. 
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80386 overview: 
Full 32-bit architecture 


Segment unit Paging unit 


32-bit 
register 
file 


Segment 
descriptor 
cache 


Page 
descriptor 
cache 


32-bit 
address bus 
(24-bit on SX) 


32-bit 
data bus 
(16-bit on SX) 


Instruction 
unit 


Prefetch | Prefetch 
queue unit 


Code prefetch unit 


Instruction decode unit Bus interface unit 


Flexible on-chip memory management 
e 32-bit registers e 32-bit bus 
e 32-bit instruction set e 32-bit addressing modes 


Figure 2-5. 80386 microarchitecture. (Reprinted by permission of Intel Corporation, 
copyright © 1986.) 


Instruction prefetch unit 


The job of the prefetch unit is relatively simple. The instruction decode unit ex- 
tracts from a 16-byte queue, and the prefetch unit tries to keep the queue full. The 
prefetch unit continually asks the BIU to fetch the contents of memory at the next 
instruction address. As soon as the prefetch unit receives the data, it places it in the 
queue and, if the queue is not full, requests another 32-bit piece of memory. The 
BIU treats requests from the prefetch unit as slightly less important than requests 
from other units. In this way, currently executing instructions requesting operands 
receive the highest priority and are not slowed down, but prefetches still occur as 
frequently as possible. The prefetch unit is notified whenever the execution unit 
processes a CALL, a JMP, or an interrupt so that it can begin fetching instructions 
from the new address. The queue is flushed whenever a CALL, a JMP, or an inter- 
rupt occurs, thus preventing the execution unit from receiving out-of-sequence 
instructions. 
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Instruction decode unit 


The instruction decode unit has a job similar to that of the prefetch unit. It takes in- 
dividual bytes from the prefetch queue and determines the number of bytes needed 
to complete the next instruction. A single instruction in the 80386 can be anywhere 
from 1 through 16 bytes. After pulling the entire instruction from the prefetch queue, 
the instruction decode unit reformats the opcode into an internal instruction format 
and places the decoded instruction into the instruction queue, which is three opera- 
tions deep. The instruction decode unit also signals the BIU if the instruction just 
decoded will cause a memory reference. This allows the operands of the instruc- 
tions to be obtained prior to the execution of the instructions. 


Execution unit 


The execution unit is the part of the CPU that does computations. It performs any 
shifts, additions, multiplications, and so on that are necessary to accomplish an in- 
struction. The register set is contained inside the execution unit. The unit also con- 
tains a logic component called a barrel shifter, which can perform multiple-bit shifts 
in a single clock cycle. The execution unit uses this capability not only in shift in- 
structions but in accelerating multiplications and in generating indexed addresses. 
The execution unit also tells the BIU when it has data that needs to be sent to the 
memory or I/O bus. 


Segmentation unit 


The segmentation unit translates segmented addresses into linear addresses. Seg- 
ment translation time is almost entirely hidden by the parallelism of the 80386. At 
most, 1 clock is required to complete the address translation. The typical case is 0 
clocks. The segmentation unit contains a cache that holds descriptor table informa- 
tion for each of the six segment registers. The segmentation unit is described further 
in Chapter 3. 


Paging unit 
The paging unit takes the linear addresses generated by the segmentation unit and 
converts them to physical addresses. If paging is disabled, the linear addresses of 
the segmentation unit become the physical addresses. When paging is enabled, the 
linear address space of the 80386 is divided into 4096-byte blocks called pages. 
Each page can be mapped to an entirely different physical address. Chapter e 
discusses the paging process in detail. 


The 80386 microprocessor uses a page table to translate every linear address toa. 
physical address. The paging unit contains an associative cache called the transla- 
tion lookaside buffer (TLB), which contains the entries Gaew addresses) for the 32 
most recently used pages. If a page table entry is not found in the TLB, a 32-bit 
memory read cycle fetches the entry from RAM. Under typical operating conditions, 
less than 2 percent of all memory references require the 80386 to look outside the 
TLB for a page table entry. 
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The time required to perform the translation varies between 0 and 5 clocks. Thanks 
to the TLB, the typical delay is only % clock. 


The 80486 Microarchitecture 


Figure 2-6 contains a block diagram of the 80486 microarchitecture. It is quite simi- 
lar to that of the 80386. The differences include an additional execution unit, which 
handles floating-point processing, and the cache unit, which is located where the 
BIU is in the 80386. A BIU is present in the 80486, but it will not be activated if a re- 
quest for data can be satisfied by the cache. 


The floating-point execution unit of the 80486 can operate in parallel with the stan- 
dard execution unit, with floating-point and standard operations occurring simulta- 
neously. The floating-point capabilities of the 80486 are covered later in this chapter. 


Paging 
unit 


Floating- 
point 
execution 
unit 


Bus 
interface 
unit 


Basic 
execution 


unit Instruction Code 
decode prefetch 
unit unit 


Figure 2-6. 80486 microarchitecture. 


The cache connection 


With a cache enabled, the 80486 obtains significant performance advantages over 
the 80386. This cache provides a general-purpose scratchpad for frequently used 
memory references. (Other processing units contain special-purpose caches, such 
as the TLB; these special-purpose caches exist in both the 80386 and the 80486.) 


The slowest 80486 has an instruction cycle time of 40 nanoseconds. External RAM 
that can respond to the requests of a processor that fast is prohibitively expensive. 
As a result, system designers use slower RAM and induce wait states. A wait state 
gets its name because the CPU must wait for external RAM to read or write the re- 
quested information. The cache holds duplicate copies of data in external memory. 
Reading the cached copy allows the 80486 to eliminate wait states. 
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The size of the cache is only 8 KB, so the processor tries to use the space intelli- 
gently and only cache the most frequently used memory values. By reading the 
cached copy, the 80486 can get its operands immediately, without memory refer- 
ence times (at least 2 clocks) or wait states (system dependent). 


The cache is described in detail in Chapter 6. 


Instruction Set Architecture 


The execution unit presents the programmer with the model for instruction execu- 
tion. It contains the logic to process instructions, to operate on various data types, 
and to interpret control information. 


Because the 80386, 80386SX, and 80486 are 32-bit processors, the typical size of 
an operand is a 32-bit quantity. Also, because these chips process data 32 bits at a 
time, it is customary to say that they have a word size of 32 bits. Unfortunately, the 
term word is ambiguous when referring to Intel processors. 


For simplicity, word refers to a 16-bit quantity, as it did in the 8086 and 80286 envi- 
ronments. The term dword, or doubleword, refers to a 32-bit quantity. The term 
32-bit word is also used. 


Bits and bit strings 


Although the basic (default) operand size on the 80386 family of processors is 32 
bits, these processors can manipulate quantities of various sizes. The most elemen- 
tary is the bit. A bit is a single binary digit, and the 80386 family implements a num- 
ber of instructions that test and modify individual bits. Bits are addressed as an 
offset from a register or memory location. The low-order bit of the operand is desig- 
nated as bit 0, the high-order bit in the low-order byte is bit 7, and the low-order bit 
of the next byte is bit 8. Figure 2-7 shows the bits in a register and in memory. If the 
operand resides in memory, negative bit offsets can also be used. Bit —1 is the high- 
order bit of the byte immediately preceding the memory address. 


Bit 1 Bit Bit Bit Bit Bit Bit — 7” 
16 8 0. 25 —16 | 
f 07 07 07 07 0 7 0 
Address 
AZ at+l1 a a-l a-2 
31 0 


Figure 2-7. Bit strings. 
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Bytes 


The byte is the basic unit of addressability in the 80386 family; that is, address 2 
refers to the third byte in memory, not the third dword. A byte is an 8-bit quantity 
that can be interpreted as either a signed or an unsigned value. Figure 2-8 shows the 
layout of a byte and the range of values that it can specify. 


in 0 
a Signed value —128 < x < 127 
Unsigned value 0 < x $ 255 
Address 
a 


Figure 2-8. Byte value range. 


When a byte is interpreted as an unsigned number, it can take on a value ranging 
from 0 through 255. If a byte is interpreted as a signed number, it is assumed to be 
in two’s complement notation. This notation allows a single byte to store values 
ranging from —128 through +127. To determine the value of a two’s complement 
number, follow these steps: 


1. Examine the most significant bit (MSB) of the value. If the MSB is 0, the number 
is positive and can be read as if it were an unsigned value. If the MSB is 1, the 
value is negative. 


2. You can find the absolute value of the number by taking the complement of the 
number (inverting the value of each bit) and adding 1. 


For example, consider the binary value 10111100B. The most significant bit, 1, indi- 
cates that the number is negative. To find the absolute value, take the complement 
(01000011B) and add 1. The result, 01000100B, is 68 decimal, so 10111100B represents 
the value —68. | 


Words 


Words, as previously defined, are 16-bit quantities. Figure 2-9 shows the range of 
values that can be stored in a word. When a word is written to memory, it is stored 

_ in two bytes. The low-order byte is written to the specified address, and the high- 
order byte is written to the next consecutive memory location. 


Word values are interpreted as signed or unsigned in the same way as are byte 
values. The only differences are that bit 15 is the MSB and that there is a greater 
range of possible values. 


aon Signed value —32768 < x < 32767 
Unsigned value 0 < x < 65535 
Address | 
a+] a 


Figure 2-9. Word value range. 
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Dwords 


Dwords are 32-bit quantities. Like bytes and words, they can be signed or unsigned. 
The extra bits allow representation of integral values greater than 2 billion. Figure 
2-10 illustrates the range of values for dwords and the way they are stored in mem- 
ory. As with words, dwords are stored in memory low-order byte first. If the low- 
order byte is stored at address m, the high-order byte is stored at address m + 3. — 


1 
2 BE ae SR mv Signed value —2147483648 < x < 2147483647 


Unsigned value 0 < x $ 4294967295 
Address 


a+3 a+2 at+l a 


Figure 2-10. Dword value range. 


The computer industry does not agree on the proper method of breaking up large 
values into bytes for memory storage. Computers like the DEC VAX use the same 
technique as the 80386. Others, such as the IBM 370 and the Motorola 68020, store 
the high-order byte first. In homage to Jonathan Swift, the two formats are known 
as “big-endian” (Motorola) and “‘little-endian” (Intel). New to the 80486 are two in- 
structions for swapping dwords from one form to the other. Data format must be a 
consideration when porting programs from one computer to another. 


Quadwords 


Quadwords are 64-bit numeric quantities. Only floating-point instructions reference 
quadword memory operands, with two exceptions: The 32-bit Multiply instruction 
generates a 64-bit value, with the high-order 32 bits in register EDX and the low- 
order 32 bits in register EAX, and the 32-bit Divide instruction accepts a 64-bit divi- 
dend stored in the same register format. 


ASCII and BCD 


In the previous examples, the values discussed represent numbers. For ASCII and 
BCD, the binary patterns represent encodings of information. (ASCII stands for 
American Standard Code for Information Interchange.) ASCII values are 7 bits of in- 
formation stored in a single 8-bit byte. The most significant bit is 0. A particular bit 
pattern represents a predefined value. For example, the binary pattern 0101011B 
represents the plus character (+). 1010011B represents the letter S, and 0110101 repre- 
sents the digit 5. Appendix B contains a table of all ASCII characters. 


Similarly, BCD, which stands for binary coded decimal, encodes representations of 
decimal numbers in binary format. Encoding a decimal digit requires 4 bits. Be- 
cause using only 4 bits of a byte is inefficient, two BCD digits are often stored in a 
single byte. This representation is called packed BCD. Figure 2-11 shows how values 
are stored in BCD notation. 
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BCD Decimal 1 Ee 9 3 2 

; 0010 | BCD 
a ; Address 
0010 2 at+4 a+3 a+2 at+l a 
0011 3 
0100 4 
0101 5 1 7 9 3 2 paciccd 
0210 6 a 
pare : Address 
1000 8 
1001 9 at2 atl a 


1010 
Invalid ) 
1111 


Figure 2-11. BCD storage. 


Because ASCII and BCD provide ways to encode numeric values and do not have a 
fixed length for such encoding, they can be used to implement variable-precision 
numbers. The 80386 and 80486 chips support ASCII and BCD arithmetic via the 
Decimal Adjust and ASCII Adjust instructions. ASCII and BCD arithmetic are dis- 
cussed in Chapter 4. 


The Register Set 


In addition to implementing the logic to execute instructions, the 80386 and the 
80486 have storage locations on the chip, called registers. Because they are inside 
the CPU, registers can be accessed as operands much more rapidly than can exter- 
nal memory. The general registers are used to store frequently accessed operands. 
Other registers contain special values that control specific aspects of the processor’s 
operation. 


The register set is partitioned into five classes: the general registers, which applica- 
tions use for data storage and computation; segment registers, which affect memory 
addressing; protection registers, which help support the operating system; control 
registers, which modify the behavior of the processor; and debug and test registers, 
which are used as their name implies. 


General registers 


The general registers are named EAX, EBX, ECX, EDX, ESI, EDI, EBP, and ESP, as 
shown in Figure 2-12 on the following page. As a rule, any instruction can use any 
general register except ESP, either as an operand or as a pointer to an operand in 
memory. Exceptions are noted in Chapter 4 in the discussion of the instruction set. 
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General registers Segment registers 
31 1615 87 ) 


Status registers 


31 16]15 0 


Figure 2-12. Base register set. 


In the 80386 family, you can address selected portions of these registers. The part of 
the register accessed depends on whether you are performing an 8-bit, a 16-bit, or a 
32-bit operation. Each division of a register has a separate name. For example, EAX 
is the name of one of the 32-bit registers. The lower 16 bits are addressable as AX, 
and that half of the register is accessible as AL (the low-order 8 bits) or AH (the 
high-order 8 bits). These names are left over from the previous generation of micro- 
processors, the 8080 and 8086, as discussed in Chapter 1. The 80386 extended the 
80286 register set to 32 bits, similar to the way in which the 8086 and 80286 ex- 
tended the 8-bit registers of the 8080 to 16 bits. The 80486 did not introduce any 
changes in the register set. Figure 2-13 shows a map of the register extensions. 


Two additional registers hold status information about the current instruction stream. 
The EIP register contains the address of the currently executing instruction, and the 
EFLAGS register contains a number of fields relevant to different instructions. 


Like the other registers, EIP and EFLAGS have 16-bit components—IP and FLAGS. 
The 16-bit forms of these registers are used in virtual 8086 mode and in running 
code written for the 80286. | 
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General registers Segment registers 
31 1615 87 0 15 0 


[| 80286 registers 


[| 80386/80486 registers extensions 


Figure 2-13. 80386/80486 us. 80286 registers. 


EFLAGS register 
A breakdown of the EFLAGS register looks like this: 


31 
A 
: 


* 80486 only 


16 15 8 7 


0 
VIR{ IN| IO JOIDI TT S| Z] jAl Py jC 
MIF} |T] PL|FJFIF/FIF/F) |F] |F] |F 


AC—Alignment check: This bit exists only in the 80486. When AC is set to 1, the 
80486 will expect all memory references to be aligned, so only the minimum pos- 
sible number of memory accesses are required to reference an operand. Because of 
the way the hardware memory interface works, a 32-bit operand must begin at a 
memory address divisible by 4, or it will require two memory cycles to read the 
operand. When AC=0, the 80486 will simply issue the necessary read cycles, despite 
the performance penalty. This is the standard behavior for the 80386. When AC=1, 
however, the 80486 assumes that the software is designed to run in the most effi- 
cient way possible and will issue an alignment fault INT 17H) if it finds this condi- 
tion to be untrue. Both 16-bit objects and 80286-compatible selector:offset pairs 
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need only be aligned on even-address boundaries. Double and extended precision 
floating-point numbers must be aligned on memory addresses divisible by 8. (Note: 
The AC bit applies only to code running at privilege level 3, application programs.) 


VM— Virtual 8086 mode: When this bit is set, it indicates that the currently exe- 
cuting instruction stream is 8086 code. The implications of virtual 8086 mode are 
covered in Chapter 7. Applications cannot change the VM (virtual mode) bit, and in- 
structions that modify EFLAGS leave the VM bit unchanged. Only the task switch 
operation or an interrupt/interrupt return can alter the VM bit. 


RF— Resume flag: This bit controls whether a debug fault can be generated dur- 
ing the execution of an instruction. When an exception occurs during program exe- 
cution, the processor pushes the current CS, EIP, and EFLAGS registers onto the 
stack and transfers control to the proper exception handler. The stack image of the 
EFLAGS register has the RF bit set to 1. When the exception handler returns to the 
interrupted instruction, the RF bit is on, which prevents a recursive debug fault 
from being generated. Any other faults (such as page faults or protection faults) oc- 
cur as usual. The debug exception has the highest priority of all 80386/80486 excep- 
tions; if, therefore, an instruction causes multiple faults, the first one processed is 
the debug exception. When control returns to the interrupted instruction, the RF bit 
is set, and the instruction is completed without retriggering the debug fault. The 
processor clears the RF bit upon completion of the interrupted instruction. Chapter 
5 contains a discussion of exceptions and support for debugging. 


NT—Nested task flag: Whenever a CALL, an interrupt, a trap, or an exception 
causes a task switch, this bit gets set. The bit is set in the EFLAGS register of the 
new task and indicates that a reverse task switch (IRET) is valid. Task switching in 
the 80386 and 80486 is discussed further in Chapter 5. 


IOPL—I/0 privilege level: This 2-bit field holds a value of 0-3 that indicates 

the privilege level required to perform I/O instructions. Although IOPL is in the 
EFLAGS register, no procedure can modify it unless the procedure is running at 
privilege level 0, and then only by using the POPF or POPFD instruction. 


A procedure’s current privilege level (CPL) must be equal to or more privileged than 
the IOPL to execute any of the following instructions: IN, INS, OUT, OUTS, CLI, or 
STI. A procedure that can execute these instructions is said to have //O privilege. 


OF — Overflow flag: When an arithmetic integer instruction is executed, the OF 
bit is set if the result is too large or too small to fit in the destination register or 
memory address. Because the OF flag is set relative to integer instructions, the CPU 
presumes that the destination register is one bit smaller in size to allow for the sign 
bit. The following instructions illustrate some examples. 


26 


2: The CPU Architecture 


MOV AL, 127 ; AL = 7FH, largest 8 -bit 
; Signed integer OF = 0 
ADD AL, 2 ; result, AL = 81H (-127) 


; should be AX == 0081 (129), OF = 1 


MOV CX, -35000 ; CX = 7748H, OF = 0 

SUB CX, 7002 ; result, CX == 5BEEH (42002) 
> should be ECX == FFFF5BEEH (- 42002), 
; OF = 1 


Notice that the OF bit is ignored if unsigned arithmetic is intended. For example, 
adding 127 and 2 in register AL generates the valid unsigned result of 129. 


DF— Direction flag: The direction flag bit modifies the behavior of the string 
instructions: MOVS, STOS, LODS, CMPS, SCAS, INS, and OUTS. When DF is 0, the 
string instructions operate on incrementally higher addresses. When DF is 1, the 
memory addresses are decremented, and the operand addresses become progres- 
sively lower. The STD instruction sets the direction flag bit, and the CLD instruction 
clears the bit. 


IF —Interrupt enable flag: When this bit is set, the processor responds to exter- 
nal hardware interrupts. When the bit is reset, interrupts are disabled—that is, 
hardware interrupts are ignored. Notice that this bit does not affect the NMI inter- 
rupt. The processor always responds to faults (exceptions) and software interrupts 
regardless of the setting of the IF bit. When IF is 0, interrupts are said to be masked. 


The STI instruction sets IF to 1, and the CLI instruction clears IF to 0. The interrupt 
enable flag is also modified when an IRET is executed. POPF and POPFD instruc- 
tions modify the interrupt enable flag only if the procedure executing the instruc- 
tion has I/O privilege. 


TF—Trap flag: The trap flag bit assists in application debugging. When the TF bit 
is set, an interrupt occurs immediately after the next instruction executes. The trap 
flag is usually set by a debugger; the debugging capabilities of the 80386 family are 
covered in Chapter 5. 


SF —Sign flag: The sign flag bit changes when arithmetic or logical instructions 
are executed. The sign flag bit receives the value of the high-order bit of the result 
and, when set to 1, indicates that the result of the instruction is negative. 


MOV EDX, -1l : sign flag unchanged by MOV 
ADD EDX, 3 ; EDX == 2, SF now 0 
NEG EDX ; EDX == -2, SF now l 


ZF—Zero flag: The zero flag bit is set when arithmetic instructions generate a 0 
result. | 


MOV AL, 0 ; zero flag unchanged by MOV 
OR AL, AL ; AL unchanged, ZF now 1 
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AF—Auxiliary carry flag: The auxiliary carry flag bit indicates that a carry out of 
the low-order nibble of the AL register occurred in an arithmetic instruction. This 
bit is used by the ASCII and BCD instructions. It allows implementation of multiple- 
digit precision decimal arithmetic. The following example assumes an ASCII encod- 
ing of the characters 4 and 7. 


MOV AL, '4' ; AL = 34H, AF unchanged by MOV 
ADD AL, '7' ; AL == 6BH, AF now 1 
AAA ; ASCII Adjust, AL = 1, AH = AH +1 


PF—Parity flag: The parity flag bit is set to 1 when an arithmetic instruction 
results in a value with an even number of 1 bits. For example, if you issued the fol- 
lowing instructions, the resulting parity flag bit would be 0. 


MOV AH, 91H ; AH = 10010001B, PF unchanged by MOV 
ADD AH, O5H ; AH == 10010110B, PF now 1 


CF—Carry flag: The carry flag bit is set when the result of an arithmetic opera- 
tion is too large or too small for the destination register or memory address. It is 
similar in operation to the OF bit but indicates an unsigned overflow of the 


destination. 
MOV AL, 127 ; AL = 7FH, CF unchanged by MOV 
ADD AL, 2 ; AL == 81H, CF now 0 
ADD AL, AL ; AL == 02H, CF now 1 (the 
mathematical result is 102H, but no 
value is "carried" into the AH register) 
MOV AL, 3 ; CF unchanged by MOV 
SUB AL, 4 ; AL == FFH, CF now 1 (borrow bit) 


Segment registers 


The segment registers hold the values that affect which portions of memory a pro- 
gram uses. Four segment registers are used under specific conditions, and two are 
available as pointers to frequently used areas of memory. The CS, DS, SS, and ES 
registers were inherited from the 80286 and perform the same functions as they did 
in that CPU. Two additional registers, FS and GS, were introduced in the 80386 and 
are also found in the 80486. 


Associated with the segment registers is a descriptor cache, which holds the starting 
address of the memory segment and other related information. Chapter 3 details the 
relationship between segments and memory addresses. The descriptor cache for the 
segment registers is not accessible to the programmer; only the 16-bit register por- 
tion can be accessed directly. Figure 2-14 illustrates the segment registers and the 
internal descriptor cache. 
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Segment registers 
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Figure 2-14. Segment registers. 


Protection model registers 
Four registers support the protection model of the 80386 family. (See Figure 2-15.) 


Protection registers Access 
Base Limit rights 


|__| Not accessible IDTR 


Visible portion “Invisible” descriptor cache 


== 


Figure 2-15. Protection model registers. 


The protection model registers are: 
GDTR— Global Descriptor Table Register 
IDTR—Interrupt Descriptor Table Register 
LDTR—Local Descriptor Table Register 
TR—Task Register 


The GDTR and IDTR contain linear base addresses that point to the start of the 
GDT and the IDT descriptor tables. They also contain limit fields that describe the 
size of the GDT and IDT tables. 


The LDTR and TR hold 16-bit selector values, similar to the segment registers. As 
with the segment registers, an inaccessible descriptor cache exists for both the LDTR 
and TR. The LDTR holds a selector for an LDT descriptor, and the TR holds a selec- 
tor for the TSS (task state segment) descriptor of the currently executing process. 
Chapter 5 discusses how these registers work. 
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Control registers 


The control registers regulate the paging and numeric coprocessor operation of the 
80386 and additionally control cache operations in the 80486. A general description 
of the registers follows; refer to the specific chapters on paging and coprocessors for 
more detailed information. A programmer can read or modify control registers only . 
by using instructions of the form MOV CRx, reg, where reg stands for one of the 
general registers. A procedure must be running at privilege level 0 to execute these 
instructions. 


CRO—Control register 0 

The following illustration shows the contents of control register 0. The LMSW and 
SMSW instructions allow access to the low-order 16 bits of CRO as the machine 
status word. 


3130 29 18 1615 87 543210 
P|cIN Al lw : NIEITIE/MIP 
GIDIw ml |p Reserve EIT|S|MIPIE 


PG—Paging: Paging is enabled by setting the PG bit to 1. Typically, the operating 
system does this once, at initialization. Chapter 6 discusses the paging mechanism. 


CRO 


CD—Cache disable: The cache disable bit is present in the 80486 only. When it is 
set to 1, cache filling is disabled, and a reference to a memory address outside the 
cache will not cause new values to be read into the cache. Clearing the CD bit to 0 
enables cache fills. Notice that operands will continue to be read from the cache 
even when CD=1. To completely turn off the cache, you must set CD and NW to 1 
and then flush the cache using the INVD instruction. 


NW—No write-through: The NW bit is also 80486 specific and is normally set to 
the same value as CD: NW=1 when caching is disabled, and NW=0 for normal cache 
operation. The state CD=1, NW=0 is useful, however, to temporarily disable cache 
fills while leaving write-through enabled. 


AM—Alignment mask: The AM bit is present only in the 80486. When set to 1, it 
enables the AC (alignment check) bit in the EFLAGS register. When AM=0, the AC 
bit is ignored. 


WP— Write protect: The write protect bit is present only in the 80486. It affects 
the behavior of the paging unit. When WP is cleared to 0, the operation of the 
80486 is compatible with that of the 80386. When WP is 1, a supervisor-mode write 
to a read-only page will cause a page fault. See Chapter 6 for more information on 
paging. 


NE—Numerics exception: The NE bit is present only in the 80486. When it is set 
to 1, unmasked floating-point exceptions vector through interrupt 16H. Clearing NE 
through 0 puts the 80486 into a DOS compatibility mode, and floating-point excep- 
tions vector through interrupt 13H. 
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ET— Extension type: In the 80486, this bit is always 1 because the floating-point 
coprocessor extension is always present in the 80486. The 80386 sets the ET bit to 1 
at boot time if the processor determines that an 80387 is present. If this bit is 0, the 
coprocessor either is an 80287 or is not present at all. When ET is 1, the 80386 uses a 
32-bit protocol to communicate with the coprocessor; otherwise, it uses a 16-bit 
protocol. 


TS—Task switched: The CPU sets the TS bit when a task switch operation oc- 
curs. When the TS bit is on, the next coprocessor instruction causes a trap to the 
operating system. This feature lets the operating system implement multitasking 
without requiring the operating system to save the state of the math coprocessor ev- 
ery time a task switch occurs. The context of the floating-point unit is more than 100 
bytes, so saving the coprocessor state at every task switch would waste valuable 
CPU time. 


EM— Emulate math coprocessor: When this bit is set, floating-point instruc- 
tions that would normally control coprocessor operation trap to the operating sys- 
tem instead. This bit is most useful in the 80386, where the numerics processor 
might be missing. Proper use of this bit allows programmers to write applications as 
if a coprocessor were present. If an 80287 or 80387 is present, the operating system 
initializes the EM bit to 0, and the application’s floating-point instructions will be 
executed by the coprocessor. If an 80287 or 80387 is not present, the operating sys- 
tem sets the EM bit to 1. Then, when an application executes a floating-point in- 
struction, the 80386 will trap back to the operating system, which either emulates 
the instruction in software or passes the operands to other floating-point hardware 
in the system. 


MP— Monitor coprocessor: The MP bit affects the operation of the WAIT in- 
struction, as described in Chapter 8. 


PE—Protect enable: Setting the PE bit places the processor into protected mode. 
Typically, this is done once, at initialization. In the 80386 and 80486, it is possible to 
switch the CPU back into real mode after entering protected mode. (This was not 
possible in the 80286.) Some implementations of the OS/2 operating system use this 
technique to allow real-mode MS-DOS programs to run concurrently with pro- 
tected-mode OS/2 applications. 


CR1—Control register 1 
Control register 1 is not used in the 80386 or 80486 and is reserved for future Intel 
processors. | 


CR2—Control register 2 


When a page fault occurs, CR2 is loaded with the linear address that caused the ex- 
ception. Refer to Chapter 6 for more details on paging. 
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CR3—Control register 3 

The paging hardware also uses this register. The CR3 contains the twenty high- 
order bits of the linear address of the starting point of the page directory. In the 
80386, the twelve low-order bits should always be zero; in the 80486, bits 3 and 4 
of the twelve low-order bits are used only in the 80486. Bit 3 controls page write- 
through (PWT), and bit 4 controls page cache disable (PCD). The implementation 
of paging is covered fully in Chapter 6. 


Debug and test registers 


The 80386 contains eight debug registers and two test registers. The 80486 adds 
three test registers. The test registers TR3—TR5 control testing of the cache; registers 
TR6 and TR7 allow diagnostic software to test the translation lookaside buffer (TLB). 


The debug registers, labeled DRO—DR7, allow the 80386 and 80486 to implement a 
hardware breakpoint capability that previously required an external in-circuit emula- 
tor. By setting the address registers (DRO—DR3) and the control register (DR7), the 
programmer can halt the CPU when a particular memory location is read from, writ- 
ten to, or executed. The breakpoints are noninvasive (they don’t require modification 
of the program under debug), and they are also real-time (they don’t degrade the 
performance of the program). The debugging techniques using the debug registers 
are described in Chapter 5. 


Floating-Point Support 


Originally, the 8086 family of microprocessors did not support floating-point arith- 
metic directly. Instead, separate chips, optimized for numeric processing, were 
offered as options. The 80486 is the first chip to support floating-point arithmetic on 
the main CPU. Its floating-point instruction set is completely compatible with the 
80387 coprocessor that was designed to support the 80386. Actually, the 80386 will 
work with either the 80387 or the 80287. The 80287 is a slower chip with a 16-bit 
interface, originally designed for use with the 80286. Floating-point performance 

of the 80287 is approximately 320,000 whetstones when running at 10 MHz. (A 
whetstone is a relative performance value that is used to compare the throughput 
of floating-point processors.) The 32-bit 80387 offers higher performance. This 
processor is software compatible with the 80287 and can execute about 1,800,000 
whetstones when running at 16 MHz. A 80486 operating at 25 MHz can run the 
same benchmark at approximately 4,000,000 whetstones. Appendix F notes the dif- 
ferences between the 80287 and the 80387. In the following text, I will use the term 
NDP (numeric data processor) to refer to the 80287, the 80387, or the floating-point 
capabilities of the 80486. Exceptions will be noted by an explicit processor 
reference. 
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The NDP is another source of parallelism in the system. As soon as the execution 
unit sees a floating-point instruction, it passes the instruction to the NDP. The exe- 
cution unit begins executing subsequent instructions regardless of how long the 
NDP takes to complete its operation. Of course, if the execution unit encounters 
another floating-point instruction, it must wait for the NDP to complete the current 
operation before it can begin a new one, and the main processor might be forced 
to wait. 


To use a value computed by the 80387 and written to memory, you must ensure that 
the 80387 has completed the write operation. The FWAIT instruction ensures syn- 
chronization between the 80386 and the 80387. (FWAIT is a synonym for the WAIT 
instruction. FWAIT is commonly used to indicate waiting for the NDP.) Because the 
NDP is not a physically separate processor in the 80486, use of FWAIT is not neces- 
sary. However, if you are writing code that might be executed on an 80386, you 
must use the FWAIT instruction. 


If a coprocessor is absent, the 80386 allows an operating system to emulate one and 
remain invisible to the application. For additional details on coprocessor emulation, 
see the discussion of the EM bit in control register 0 earlier in this chapter. 


Additional data formats 


The NDP adds direct hardware support for three floating-point number formats and 
one BCD integer format. The NDP also supports three integer formats in common 
with the basic execution unit. These are the 16-bit, 32-bit, and 64-bit two’s comple- 
ment (signed) integers previously mentioned. Figure 2-16 on the following page 
shows the additional numeric formats. 


Floating-point numbers 

The NDP supports three floating-point formats. This allows a programmer to make 
compromises between the amount of memory required and the precision of the 
results. The short real format lets programmers specify numbers of about seven 
decimal digits of accuracy. This format is also known as single-precision because a 
short real number fits into a single 32-bit machine word. The Jong real format, also 
known as double-precision, represents a floating-point number of up to 15 decimal 
digits of accuracy. Holding a long real number requires a double machine word (64 
bits). The third format is called temp (temporary) real or extended-precision. Temp 
real numbers are 80 bits and represent about 19 decimal digits of precision. 


Just as scientific notation represents floating-point quantities in decimal notation 
(for example, 4.74 x 103), the Intel floating-point format is a type of binary scientific 
notation. The general format of a floating-point number is +f x 2°, where f repre- 
sents a binary fraction and e is an exponential power of 2. Three fields are required 
to make up a floating-point number: the sign, the exponent, and the fraction, or 
significand. 
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15 0 
| | Word integer 
31 0 
63 0 
79 78 BCD digits 
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Sign Exponent Fraction 
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Address 
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Figure 2-16. Floating-point formats. 


The sign field is a single bit that is set to 1 to indicate a negative number and reset 

to 0 for a positive value. No value manipulation is necessary to change the number 
from positive to negative (or vice versa) other than toggling the sign bit. (Such ma- 
nipulation is necessary when dealing with the two’s complement notation of the 
integers.) This notational format allows the representation of +0.0 and —0.0, which is 
useful in certain circumstances. 


The exponent field represents a multiplier of 2”. This field ranges from 8 bits in the 
short real format to 11 bits in the long real format to 15 bits in the temp real format. 
To accommodate negative exponents (such as 2-°), the value in the exponent field 
is biased—that is, the actual exponent is determined by subtracting the appropriate 
bias value from the value in the exponent field. For example, the bias for short reals 
is 127. If the value in the exponent field is 130, the exponent represents a value of 
2130-127 or 23, The bias for long reals is 1023, and the bias for temp reals is 16383. 
The values 0 and all 1’s (binary) are reserved for representing special values and 
cannot be used to represent floating-point numbers. 
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The significand field contains the fractional part of the floating-point number. The 
significand occupies 23 bits in short reals, 52 bits in long reals, and 64 bits in temp 
reals. Figure 2-17 shows how to interpret floating-point fractions. The significand is 
encoded in two ways. In temp real format, the significand field holds the binary 
fraction in the form Sp.5,S2...5¢3, where s,, is bit ” of the significand. 


The authors of the IEEE-754 format took advantage of a representational trick to 
squeeze out an extra bit of precision in short real format and in long real format. 
A review of scientific notation shows that the values 40.103 x 107, 4.0103 x 108, and 
0.040103 x 101° all represent the same number. A binary notation has the same 


property. 


Shifting the fraction by one position can be compensated for by incrementing or 
decrementing the value of the exponent. Because a binary number consists of only 
0’s and 1's, the designers of the floating-point format decided that the fractional por- 
tion of the short and long reals would be shifted left until the most significant bit 
was 1. Because this bit was now defined as 1, there was no point in storing it, and 

it was assumed to exist. The fraction for a short or long real, therefore, has the value 
1.s98,S5...S,,, Where 7 is 22 for short reals and 51 for long reals. 


Decimal fraction Binary fraction 
3} 7 ! Ze Oa 1/110 1}0;0/)]1 
I l 
10° 10° 07071010 2°|2 na 22> 2" 
I ! 
Decimal point Binary point 
37.2101 decimal 6.5625 decimal 


Normalized fraction 
Single digit before + | 
the “binary point” 


Short real + 1.] Significand | = Fraction (MSB implied) 


Long real 


Significand 
Temp real ae are se Fraction directly represented 


Figure 2-17. Floating-point fractions. 
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Short real: 

31 229) ; 
Significand Single 

. precision 
‘ S22 


Absolute value = 1.5)5,.. . 85, x 2 exp 127) 


The bias for the short real exponent is 127. The significand includes the “implied 1” 
bit and allows a precision of about seven decimal digits. Representative values 
range from +1.18 x 10-38 through +3.40 x 1038. 


Long real: 

63 62 5251 0 
‘ ae Double 
_ Exp Significand precision 


°o S51 
Absolute value = 1.s) 5, . . . $51x a (exp —1023) 


The bias for the long real exponent is 1023. The significand includes the “implied 1” 
bit and allows a precision of about 15 decimal digits. Representative values range 
from +2.23 x 10-3°8 through +1.79 x 103°8, 


Temp real: 

79 78 64 63 QO 
+ a tesess Extended 
x Exp Significand precision 


So S63 


Absolute value = Sp .S; . . . S63 x 9 (exp -16383) 


The bias for the temp real exponent is 16383. The significand represents the frac- 
tional portion of the value (with no implied bits) and allows a precision of about 19 


decimal digits. Representative values range from £3.37 x 10-4932 through +1.18 x 
104932. 


Special floating-point values: In addition to intuitive values such as 3.14159 and 
6.03 x 1023, the NDP represents values that arise under unusual conditions. These 
values are called infinities, denormals, and NaN’s. (NaN stands for “not a number.”) 


Infinity, positive or negative, is represented by a value whose exponent field is all 1’s 
and whose fraction is 1.0B. Notice that in short and long real numbers, 1.0B is repre- 
sented by a significand of all 0’s, whereas in temp real numbers, the significand is a 
binary 10000000... OB. 
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Denormals are values that are too small to be represented in the standard (or nor- 
malized) fashion. Denormals are represented by a value with an exponent field of 0 
and any nonzero value in the significand. A floating-point number with both an ex- 
ponent of 0 and a significand of 0 represents 0.0. 


NaN’s are invalid representations of floating-point numbers. They are identified by 
an exponent field of all 1’s and a significand other than the one representing infinity. 
The two kinds of NaN’s are the signaling NaN and the quiet NaN. A signaling NaN 
has a fraction of the form 1.0xxx...xB, where x represents any bit value. Notice that 
the binary value represented by the x cannot be zeros, as that value is reserved for 
infinities. The NDP generates an exception whenever a signaling NaN is used. The 
NDP never creates a signaling NaN, but a programmer can use one to indicate some 
error condition such as an uninitialized floating-point variable. The quiet NaN has a 
fractional format of 1.1xxxxxB. Recall that the leading 1 is implied in the significand 
of short and long reals but must be present in temp reals. The 80387 generates a 
quiet NaN instead of a numeric result whenever a floating-point instruction causes 
an invalid operation. Any instruction that receives either type of NaN as an operand 
generates a NaN as a result. The following table lists special values used by the NDP. 


Sign | Exponent Fraction Value 

x 11...11B 1.1xx...xxB Quiet NaN 

x 11...11B 1.0xx...xxB Signaling NaN 
x 11...11B 1.00...0B Infinity 

x 00...00B O.XxxxxxB Denormals 

x 00...00B 0.00...0B Zero 


Except for the signalling NAN (in which at least one of the x’s must be a 1), the x in- 
dicates that it makes no difference whether the bit is 0 or 1. The 1 before the decimal 
in the fraction is physically present only in temporary real format. It is implied in 
the short real and long real formats. Denormals are recognized in the short and long 
formats by the 0 exponent value. 


BCD integer 

The other new data type that the NDP supports is a packed decimal integer of 18 
digits stored in 10 consecutive bytes of memory. The high-order bit of the high- 
order byte is interpreted as a sign bit. A 0 indicates a positive number, and a 1 indi- 
cates a negative number. The rest of the high-order byte is unused. The remaining 
bytes each contain two BCD digits. 


7271 6463 5055 4847 4039 3231 2423 1615 87 


Le Le] es] es] os] os] oo] oo] oa] oa 
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The value range of the BCD integer is 0 through +999,999,999,999,999,999. Program- 
mers who work with BCD numbers might want to run the NDP with the precision 
exception unmasked (PM bit and bit 5, in the control word register). Because BCD 
formats often represent monetary values, it is important to avoid losses due to 
rounding or truncation. 


NDP register set 


The NDP contains a register file of eight 80-bit floating-point registers and a num- 
ber of status registers. Floating-point instructions refer to these registers rather than 
to the general registers EAX, ESI, and so on. (See Figure 2-18.) 


15 0 
Floating-point registers 


Sign Exponent Fraction 


Error pointers 


Figure 2-18. 80387 register file. 


Unlike the general registers of the 80386 and 80486, however, the NDP’s floating- 
point registers are addressed as a stack. The current top-of-stack (the value most re- 
cently pushed) is indicated by a field in the status word register and is addressed as 
ST or ST(O). The next register (the previous value pushed) is STQ), and so on. This 
is best illustrated by the following example. 


Assume that the configuration in Figure 2-19 shows the initial state of the NDP. 
Register 2 is designated as the current top-of-stack, but nothing is stored in the 
registers. The TW (tag word) register holds a 2-bit field for each register, marking it 
as valid, 0, special, or unused. To evaluate the polynomial y = 3x2 — 7x + 4, we will 
use the following code fragment. (Figure 2-19 shows how the function evaluation 
progresses on the floating-point stack.) 


x DD ? ; short real variable "x" 
y DD ? ; result of computation 


const DW ? ; memory word for integer constants 
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FLD x ; load x to top of stack 

FLD ST(0) ; duplicate copy of x 

FMUL ST(0) ; square copy of x at top of stack 
MOV const, 3 : integer multiplier 

FIMUL const ; multiply top of stack by 3 

MOV const, 7 ; integer constant 

FILD const ; load 7 to top of stack 

FMULP ST(2), ST ; ST(2) = x * 7, pop ST 

FSUBRP ST(1), ST ; ST(1) = ST - ST(1), pop ST 

MOV const, 4 ; integer constant 

FIADD const 3 3x2 = 7X +4 

FSTP y ; store result and pop, clearing stack 


(a) Initial state 


STO) 


FLD ST (0) 


NI WNW BR ODO NY KF CO 
NINA WW BR OW NH KH CO 


ST(O) 
ST(1) 


FMUL ST(O) 


ST(O) 
ST) 


MOV const, 3 
FIMUL const 


NW UW BR OW NN RH © 


NWA WM BR DO NY HH © 


0 ST) 0 STQ) 
1 ST) 1 ST(Q) 
: 
4 FILD const : 
5 5 FMULP ST(2), ST 
6 6 

{7 7 STO) 

Figure 2-19. Evaluating a polynomial. (continued) 
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Figure 2-19. continued 
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The NDP register addressed by ST(”) varies according to the value of the TOP field 
in the status word register. The following section describes the other fields in the 
status word register. 


Status word register 
The status word register can be illustrated as follows: 


15 8 7 ) 
jp fes} tor {ca} ci|co|es sr pe |ue}or} ze) pe] 


B—Busy: This bit is 1 when the NDP is executing an instruction or when an un- 
masked exception (bits 0-5) is indicated. Execute the instruction FNSTSW AX, 
which copies the status word register to the AX register, to examine this bit avail- 
able for testing. 


C3, C2, C1, CO— Condition codes: The NDP sets these bits when a floating-point 
compare, test, examine, or math instruction is executed. The various combinations 
that occur are discussed under the relevant instructions in Chapter 8. 


TOP—Top-of-stack: This field indicates which of the floating-point registers is 
currently identified as the top-of-stack. When a new value is pushed onto the 
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register stack, the value of TOP is decremented by 1. When a value is popped from 
the stack, TOP is incremented by 1. The results of the increment or decrement are 
truncated to three bits to allow addressing of eight floating-point registers. 


ES—Error summary: The NDP sets this bit to 1 whenever a floating-point in- 
struction generates an unmasked exception. Indication of such an exception is 
found in bits 0-5 of this register. The exception masks themselves are located in 
the control word register. 


SF—Stack fault: The NDP sets this bit to 1 if an instruction causes a stack 
overflow by pushing too many operands or a stack underflow by popping the stack 
when there are no more values. This field does not exist in the 80287, so floating- 
point code that must run on any possible 80386 configuration should not rely on 
having the bit. A stack fault also results in an invalid operation exception. 


Before discussing each field, let’s note a couple of things about bits 0-5 of the status 
word register. These bits correspond to exceptional conditions that can occur while 
floating-point instructions are being executed. 


Whenever a condition represented by an exception bit occurs, the NDP first sets 
the appropriate bit in the status word register. Next, it checks the corresponding 
mask bit in the control word register. If the mask bit is O Canmasked), the NDP trig- 
gers the numeric exception. If the mask bit is 1 (masked), the NDP continues by 
executing the next instruction. 


Additionally, the exception bits are “sticky.” Once set, they remain set until the pro- 
grammer loads the status word register with a new value. This lets the programmer 
write a series of numeric instructions and place a test for errors at the end of the in- 

struction stream rather than after each instruction. 


PE—Precision exception: This exception occurs when the NDP cannot repre- 
sent the exact result of a floating-point instruction. For example, the fraction “% can- 
not be represented exactly as a decimal fraction because it produces an infinitely 
repeating result. Any finite representation, such as 0.3, 0.333333333, or even 
0.333333333333333333333333333333, is only an approximation. Similarly, the NDP 
cannot represent this fraction exactly in binary format. Dividing 1 by 3 results in the 
infinite binary fraction 0.01B. 


This exception also occurs when a temp real number is converted to a lower preci- 
sion and bits are lost in the conversion. 


The precision exception is almost always masked because a rounded or truncated 
result will suffice in most cases. 


UE—Underflow exception: The underflow exception is triggered when the 
result of an operand is too small for the NDP to represent. For example, the smallest 
value that can be represented in the 80-bit extended-precision format is 3.37 X 
10-4932, Attempting to square a number such as 10-3°°° results in an underflow 
exception. 
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OE— Overflow exception: This exception is the converse of the underflow ex- 
ception. It occurs when the result of a floating-point operation is too large for the 
NDP to represent. Like the precision exception, UE and OE can be generated when 
a normally representable number is converted to a format in which it is not 
representable. 


ZE—Zero divide exception: Whenever division by zero is attempted, the ZE ex- 
ception occurs. This exception can be caused by floating-point operations other 
than the divide instruction, such as sine, cosine, remainder, and so on. 


DE—Denormal exception: This exception occurs whenever an operand of a 
floating-point instruction is a denormal. Denormal numbers are discussed earlier in 
this chapter. 


IE— Invalid operation exception: This exception traps all error conditions not 
handled by the previously discussed exceptions. These can include arithmetic faults 
(such as an attempt to take the square root of a negative number) or programmer 
faults (such as specifying a register that contains no value as an instruction 
operand). 


Control word register 

A programmer modifies the CW (control word) register of the NDP to alter its 
behavior. The format of the control word register and the definition of each field 
follows: 


15 12 87 0 
px|x|x}o} rc | vc |x| x [emlumfom|za1{on| im 


Bit 12 = 0 (infinity control on 80287): Bit 12 is ignored on the 80387 and 80486. 
On the 80287, this bit selects either affine (bit is on) or projective closure (bit is off). 
Affine closure allows the use of both positive and negative infinity. In projective 


closure, very large or very small numbers overflow to a single unsigned infinity. 
Only affine closure is supported by 80387 and 80486 NDP’s. 


RC— Rounding control: This field specifies how the NDP handles values that it 
cannot represent exactly. The RC field can be set to one of the following modes: 


00— Round toward nearest (choose even number if equidistant) 
01—Round up (toward negative infinity) | 

10— Round down (toward positive infinity) 

11— Round toward zero (truncate) 

Node 00 (round nearest) is the default. 


To see how the rounding control affects the results of a computation, assume that 
the NDP can represent only the integers —5 through +5. Figure 2-20 shows the 
results of rounding the values 24, 1%, -1%, and —2% in each rounding mode. 
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Figure 2-20. Rounding control. 


PC—Precision control: The PC field tells the NDP which floating-point format to 
use when generating the results of add, subtract, multiply, divide, and square-root 
operations. This field can hold one of the following values: 


00—Single-precision (24-bit significand) 
01—Reserved for future coprocessors 

10— Double-precision (53-bit significand) 
11—Extended-precision (64-bit significand) 
Node 11 is the default. | 


Instructions other than those affected by the PC field generate extended-precision 
results or have a precision specified by the operand. 


PM, UM, OM, ZM, DM, IM— Mask bits: The remaining bits in the control word 
register are the mask bits for the exception conditions and correspond to bits 0-5 of 
the status word register. The mask bits are: 


~PM—Precision mask 

UM— Underflow mask 

OM— Overflow mask 

ZM— Zero divide mask 
DM—Denormal operand mask 


IM—Invalid operation mask 
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Tag word register 

The remaining status register on the NDP i is the 16-bit tag word register. This regis- 
ter consists of eight 2-bit fields, each corresponding to a floating-point register. TO is 
the field for register 0 (not STO), T1 is associated with register 1, and so on. Each tag 
field holds one of the following values, which give additional information about the 
contents of the corresponding register: 


00—The register contains a valid floating-point number. 

01—The register contains the value 0.0. 

10—The register contains the value infinity, a denormal, or an invalid number. 
11—The register is empty (unused). 


The tag word register is normally not used by the programmer. A debugger that dis- 
plays the contents of the floating-point stack must examine the contents of the tag 
word register to properly interpret the contents of the floating-point registers. 


Error pointer registers 

The only other registers on the NDP are the error pointer registers. These registers 
are updated each time a new floating-point instruction is executed. Whenever a 
floating-point instruction causes an exception, these registers can be queried to de- 
termine which instruction is at fault. Note that no instructions directly address these 
registers. The store environment operation (FSTENV) copies the contents of all 
NDP status and error-pointer registers to memory, where the data can be examined. 


The error pointer registers are necessary because of the parallel operation of the 
main execution unit and the NDP. The main execution unit, which is executing 
simpler, faster instructions, might be executing code in a different segment when 
the NDP generates an exception. The error pointer registers make it much easier to 
determine what went wrong when a floating-point exception occurs. 


31 16 15 __0 


FIP— Floating-point instruction pointer: This register is loaded with the con- 
tents of EIP when a coprocessor instruction is executed. 


FCS — Floating-point code segment: This register is loaded with the value of the 
CS register when a floating-point instruction is executed. 
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FOP— Floating-point opcode: This register is loaded with 11 bits of opcode infor- 
mation. A coprocessor instruction always has the following format: 


First byte Second byte 


7 ) ii 0 
20a |afefefe 


The second byte of the instruction is concatenated with the 3 low-order bits of the 
first byte to form the contents of the FOP register. Early versions of the 80386 did 
not generate this information for the 80387, nor is it available when the 80386 is 
used in protected mode. It might be simpler to use the FCS and FIP values to deter- 
mine the opcode at fault, unless you know your code is running on an 80486. 


(Optional bytes) 


FOS— Floating-point operand segment: This register contains the segment 
register of the memory operand (if any) referred to by the most recent floating-point 
instruction. 


FOO— Floating-point operand offset: This register holds the offset (within the 
segment pointed to by FOS) of the memory operand Cif any) referred to by the most 
recent coprocessor instruction. 
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MEMORY 
ARCHITECTURE: 
SEGMENTATION 


) 


A segmented memory architecture is a hallmark of the Intel 8086 family of pro- 
cessors. The 80386 was the first of these processors in which segmentation was not 
considered an impediment to the programmer. 


Linear vs. Segmented Memory 


The hardware interface between the CPU and memory is virtually the same in 
almost every computer. A set of address lines goes out from the processor to 
memory. The CPU places an address on the bus, and memory responds by return- 
ing the value stored at that location or by accepting a new value. Figure 3-1 shows 
the hardware relationship between the CPU and memory. 


Control signals 


Coprocessor 


Memory 
system 


Figure 3-1. CPU—memory interface. 
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Because of the binary nature of the digital computer, a system with m address lines 
allows the system to reference 2” elements of memory. The hardware behaves in a 
linear fashion—that is, for each of the 2” possible combinations of address lines, 
a separate memory element responds. 


Most computers also have a linear memory model. They allow programmatic access 
to memory, beginning with address 0 and continuing through address 2” — 1. 
Theoretically, an application could read the byte at location 0, then read the next 
byte, and so on until it reads the last byte of memory in the system. This model par- 
allels the hardware interface. 


However, the 8086, 80286, 80386, and 80486 have a programmatic memory model 
different from the hardware memory model. These processors have a segmented 
memory model. To a program, the address space is divided into segments, and the 
program can access only data contained in those segments. Within each segment, 
addressing is linear, and the program can access byte 0, byte 1, byte 2, and so on. 
The addressing is relative to the start of the segment, however, and the hardware 
address associated with software address 0 is hidden from the programmer. 


This approach to memory management is natural. Programs are typically divided 
into segments of code and data. A program can be made up of a single code seg- 
ment and a single data segment, or of many code and data segments. In a multitask- 
ing environment, segmentation also isolates processes from one another. If my 
program can look at only my code and my data, it cannot illicitly modify your pro- 
gram’s code or data. Figure 3-2 shows a multiprocessing system with many seg- 
ments coexisting in memory. 


HW 
addresses 


Segment 
| addresses 


Prog 1 code 


Prog 2 code 
Prog 2 data 


Prog 1 data 


Figure 3-2. Memory divided into segments. 
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The 80386 and 80486 have six segment registers. The values in these registers deter- 
mine the memory segments that a program can access. The CS register points to the 
segment that contains the program’s code. CALL and JMP instructions implicitly 
refer to the current code segment. The DS register points to the program’s main 
data area. For example, the instruction: 


MOV AL, [0] 
copies the first byte (byte 0) of the data segment into register AL. 


The stack segment (pointed to by register SS) is commonly (but not necessarily) the 
same segment as the data segment. The PUSH and POP instructions store data into 
or read data from the stack segment. 


Three additional registers CES, FS, and GS) point to auxiliary data that the program 
needs to access less frequently, such as COMMON variables in a FORTRAN pro- 
gram. You can apply a special prefix to an instruction that accesses the data seg- 
ment register. The prefix causes the instruction to act on one of the additional 
segments instead. For example, the previous instruction might be written as: 


MOV AL, ES:[0] 

to fetch the first byte from one of the alternate data segments, or even as: 
MOV AL, CS:[0] 

to fetch the first byte from the code segment. 


Previous generations of the 8086 family also dealt with segmented memory; how- 
ever, these processors limited the size of a segment to 64 KB, which was often much 
too small. A single segment in the 80386 and in the 80486 can be as large as 4 
gigabytes (GB). 


An operating system designer can choose to simulate a linear memory model (also 
called a flat model) on the 80386 and 80486 by creating one very large code seg- 
ment and one very large data segment and having all programs use the same values 
for CS and DS. This is a common technique when porting systems that have run on 
linear address machines. The UNIX operating system—with its VAX heritage—is 
typically implemented on linear memory machines. 


Virtual Addressing 


Except when operating in real mode, the 80386 and 80486 are virtual memory pro- 
cessors. When an instruction requests the contents of a memory location, the in- 
struction refers to the location not by an actual hardware memory address but by a 
virtual address. The virtual address is really a name for a memory location. The 
processor translates the location name into an appropriate physical location. The 
operating system must maintain the proper mapping between virtual and physical 
memory. | 
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This concept is not as convoluted as it might sound. For example, suppose someone 
says to me, “Put this report on the boss’s desk.” In my particular department, that 
might mean, “Put this report on Simon Legree’s desk.” If, however, I transfer to a 
new department, I might be placing my report on Ebenezer Scrooge’s desk. “The 
boss’s desk” is a virtual location, and I can carry out the instruction to turn in my 
report even though the desk on which I place the report varies according to the 
circumstances. 


A virtual address on the 80386 family is specified by two numbers, a selector and an 
offset. The selector is a 16-bit value that serves as a virtual name for a memory seg- 
ment. It is the selector that is loaded into the segment registers (CS, DS, and so on). 
The offset is the distance from the beginning of the segment, and it is a 32-bit value. 
Examples of virtual addresses include: 


Virtual Address Interpreted Virtual Address 
3F11:00000000 Offset OH from selector 3F11H 
01A9:0001FFOO Offset 1FFOOH from selector 01A9H 
EC2C:31887004 Offset 31887004H from selector EC2CH 


The CPU translates a virtual address to a single 32-bit number called a linear ad- 
dress. Figure 3-3 shows an example of address translation. This linear address goes 
out on the system bus unless the paging feature is enabled. Paging is another level 
of address translation and is discussed fully in Chapter 6. 


4GB Memory 


Offset from start 
of segment 


Segment base address 


Virtual address 
translation 


Figure 3-3. Linear address translation. 
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Virtual-to-linear address translation 


The CPU uses the selector as an index to a set of system tables called descriptor 
tables. A descriptor is a block of memory that describes the characteristics of a 
given element of the system. In the case of a memory segment, the characteristics 
include the segment’s linear base address, limit, access rights, and privilege level. 


The base address is the starting point in the segment’s linear address space. The off- 
set portion of a virtual address is added to the base address to generate the linear 
address of the desired memory element. Figure 3-4 illustrates an example. The vir- 
tual address 13A7:0010F405H is broken down into its segment and offset compo- 
nents. The system uses the selector 13A7H as an index into its descriptor tables. It 
pulls out a descriptor that says, for example, that the segment has a base address in 
the linear address space of 0032DDOOOH. The virtual address offset is combined 
with the base, and the resulting value, 33EC405H, is the translated linear address. 


The linear address is a full 32-bit value in all members of the 80386 family; however, 
the 80386SX hardware supports only a 24-bit physical address. The 80386DX and 
80486 hardware supports the full 32-bit linear address space (232, or 4 GB). The 
base address of a segment will always fall within this range. In the same way that 
the base address defines the starting point of a segment, the limit field defines the 
end point. The limit specifies the segment’s last addressable byte. The processor 
checks every instruction that addresses memory to determine whether the instruc- 
tion is attempting to read into or to write from memory within the boundaries of 


4 GB Memory 


Linear address 
33EC405H 


address 
Virtual address 


Selector Offset 
Descriptor 
0010F405H ee 


Base address is added to offset 
yielding linear address. 


Figure 3-4. Virtual-to-linear address translation. 
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the segment’s descriptor. An out-of-bounds reference causes an interrupt called a 
general protection fault to occur. Faults are discussed in the section on interrupts 
and exceptions in Chapter 5. The access rights field defines the type of segment and 
the privilege level required to access it. 


Segment descriptors 

At this point, you probably visualize a descriptor as something like the item in 
Figure 3-5. Indeed, all the data in this figure is in the descriptor; however, because 
of space and compatibility constraints, the real thing is not quite so pretty. Figure 
3-6 shows the actual format of a segment descriptor. | | 


|___Base address | 


Figure 3-5. Visualized descriptor. 


80386/80486 
63 48 47 4443 4039 32 31 16 15 0 
Base A S 
address |G Vv DPL| =| Type se es Si 23 
24..31 L - 1 7 
| pete eee | 
Access 
rights 
Figure 3-6. Actual 80286/80386/80486 descriptors. (continued) 
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Figure 3-6. continued 
80286 
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Base address: The base address portion of the descriptor is the address of offset 0 
in the segment. This field is 32 bits and is constructed from bytes 2, 3, 4, and 7 of 
the descriptor. In the 80286, the base address is only 24 contiguous bits. However, 
Intel specified that bytes 6 and 7 of the 80286 descriptor were to be set to 0 to en- 
sure that 80286 code would run properly on an 80386/80486. 
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Limit: The limit field determines the last addressable unit of the segment. The limit 
field is 20 bits, comprising bytes 0 and 1 of the descriptor and the low-order four bits 
of byte 6. Again, the split occurs because of the difference in the limit field sizes be- 
tween the 80286 and the 80386/80486. Those of you handy with binary arithmetic 
might note that a 20-bit limit field allows the addressing of only 22°, or approxi- 
mately 1 million, items. 


At first glance, this seems to mean that an 80386/80486 segment is limited to 1 
megabyte. This is not the case, although the segment is limited to 1 million items. 
The G bit in byte 6 of the descriptor stands for granularity, and 80386/80486 seg- 
ments come in two forms, byte granular (G = 0) and page granular (G = 1). 


The terms granularity and resolution are similar in meaning. A high-resolution im- 
age is made of very tiny items, and a lower-resolution image is made of larger items. 
The limit of a byte granular segment is measured in bytes; a page granular segment 
is measured in larger pieces called pages. 


A page is 212, or 4096, bytes. This makes the limit on the size of a segment 22° pages 
of 212 bytes, for a total of 232 bytes (4 GB). Again, a segment of code ported from the 
80286 is always a byte granular segment because the seventh and eighth descriptor 
bytes are required to be 0. 


For example, assume that the DS register points to a byte granular segment with a 
limit of OOIFH. The size of the segment is 20H (32 decimal) bytes, and the last ad- 
dressable byte of that segment is byte 0O1FH. 


Mlegal Instruction Reason 

MOV EAX, [1234H] Memory address beyond limit 

MOV EAX, [001DH] Size of item read extends beyond limit 
MOV AL, [0020H] Memory address beyond limit 

MOV [001FH], AX Size of item written beyond limit 
Legal Instruction Reason 

MOV EAX, [O000H] Last byte read is 3H 

MOV EAX, [001CH] Last byte read is 1FH 

MOV AL, [001FHI Last byte read is 1FH 

MOV [001EH], AX Last byte written is 1FH 


Now imagine a page granular segment with a limit of OOOOH. The size of the seg- 
ment is one page, and page 0 is the last addressable page. A page has 1000H (4096 
decimal) bytes in it, so the last addressable byte is OFFFH. 


54 


3: Memory Architecture 


Megal Instruction Reason 

MOV EAX, [1234H] Memory address beyond limit 

MOV EAX, [OFFDH] Size of item read extends beyond limit 
MOV AL, [1020H] Memory address beyond limit 

MOV [0FFFH], AX Size of item written beyond limit 
Legal Instruction Reason 

MOV EAX, [0000HI] Last byte read is 3H 

MOV EAX, [OFFCH] Last byte read is OFFFH 

MOV AL, [OFFFH] Last byte read is OFFFH 

MOV [0FFEH], AX Last byte written is OFFFH 


Access rights: The access rights portion of the descriptor has the following format: 


765 43 2 1 «0 


P| DpL|s| TYPE [A 


The P bit stands for “present.” It is set to 1 when the segment indicated by the selec- 
tor is present in physical memory. In a virtual memory system, the operating system 
can move the contents of some segments to disk if physical memory is full. It then 
marks the descriptor as not present by resetting the P bit to 0. If an application loads 
a selector into a segment register and the descriptor associated with the selector has 
P = 0, the not-present interrupt (11 decimal) is generated. The operating system 
then looks for a free area of physical memory, copies the contents of the segment 
from disk back into memory, updates the descriptor with the new base address, sets 
P to 1, and restarts the interrupted instruction. 


The DPL field contains the privilege level of the descriptor. The privilege level 
ranges from 0 (most privileged) through 3 Cleast privileged). A task can access seg- 
ments of equal or lesser privilege. A task can only read data from or store data into 
segments of equal or lesser privilege. A task can call only code segments of the same 
privilege; however, access to segments of higher privilege can be granted indirectly 
via the gate, a feature of the protection mechanism. A task can never invoke a code 
segment of lesser privilege. 


The privilege level of a task, called the current privilege level (CPL), is the privilege 
level of the currently executing code segment. Typically, the most secure portions 
of the operating system run at level 0. Other system software might run at a less 
privileged level, and applications typically run at level 3. (See Chapter 5 for a de- 
scription of the protection mechanism.) 
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The S (segment) bit is always set to 1 fora memory segment. When S is equal to 0, a 
descriptor describes an object other than a memory segment. These objects are de- 
scribed in Chapter 5. 


The TYPE field indicates the types of operations allowed on the segment. Valid 
values for TYPE are: 


Read-only data segment 

Read/write data segment 

Unused 

Read/write expand-down data segment 
Execute-only code segment 
Execute/readable code segment 


Execute-only “conforming” code segment 


N WNW KR WW NY fF © 


Execute/readable “conforming” code segment 


The type indicator defines the access rules applied to a segment. The CS register 
cannot be loaded with a selector of a segment of type data (0-3). No program can 
modify a segment that cannot be written. Segments that are not readable can be 
executed but not read as data. An attempt to violate any of these rules results in a 
protection fault. Conforming segments are discussed in Chapter 5. Expand-down 
segments are covered later in this chapter. 


The processor sets the A (accessed) bit when the selector for the descriptor is loaded 
into a segment register. The operating system can use this bit to find out which seg- 
ments are not frequently used and can therefore be swapped to disk if necessary. 


Additional fields: Four additional fields in the segment pee are located in 
the high-order nibble of byte 6. 


The G bit, described previously, regulates the granularity of the segment. 


Bit 6 is referred to as the D bit if the descriptor is for an executable segment or as the 
B bit if the descriptor type is a data segment. The D bit is set to 1 to indicate the 
default, or native mode, instruction set. When D is equal to 0, the code segment is 
presumed to be an 80286 code segment, and it runs with 16-bit offsets and the 
80286-compatible instruction set. 


The B bit is set to 1 in any data segment whose size is greater than 64 KB. 
Bit 5 must be set to 0. It is reserved for use in a future Intel microprocessor. 


Bit 4 (AVL) is available for use by system programmers. Possible uses include’ mark- 
ing segments for garbage collection or indicating segments whose base addresses 
should not be modified. 


Expand-down segments, indicated by TYPE = 2 or TYPE = 3, are a special kind of 
data segment designed for use with the stack. Figure 3-7 shows a stack that resides 
in its own segment. - 
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Limit 


ESP 


Ss 0 


Figure 3-7. Stack residing in its own segment. 


As more data is pushed onto the stack, the stack pointer (ESP) nears 0. If too much 
data is pushed onto the stack, the program attempts to decrement ESP beyond 0, 
resulting in a stack fault. At this point, the operating system has no choice but to ter- 
minate the program. 


Placing the stack in an expand-down segment rather than in a normal data segment, 
however, will change the way memory is addressed inside the segment. 


Although normal segments are addressed beginning at 0 and extending to limit, 
expand-down segments begin at /imit + 1 and extend to FFFFFFFFH. Figure 3-8 
illustrates the difference. 


Normal data segment Expand-down segment 
2047 FFFFFFFFH 
2048 2048 
ESP 
35 0 limit+1 


Figure 3-8. Normal data segments and expand-down segments. 


The advantage of this approach is that when the stack pointer is decremented past 
the limit and triggers a stack fault, the operating system can extend the size of the 
segment and decrement the limit. The faulting instruction is then restarted, allow- 
ing the program to run with a larger stack segment. Figure 3-9 on the following 
page shows how this is accomplished. : 


Notice that when a descriptor for an expand-down segment is created, the base ad- 

dress must be set to the linear address of the first byte after the end of the segment 

rather than to the address of the start of the segment. Because addressing arithmetic 
is limited to 32 bits, large offset values can be viewed as if they were negative num- 

bers. For example: 


base + FFFFFFFFH = base + —1 = base —1 
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FFFFFFFFH 


2048 


3096 


Old limit 
ESP 


New limit 
Figure 3-9. Extending the size of the segment. 


Descriptor tables 

All the descriptors are grouped together in descriptor tables. The two system de- 
scriptor tables are the Global Descriptor Table (GDT) and the Interrupt Descriptor 
Table (IDT). The IDT contains no segment descriptors, so it is not discussed here. 


A full description of the IDT and other facets of the protection mechanism is given 
in Chapter 5. 


An operating system can also implement various Local Descriptor Tables (LDTs). 
Segment descriptors are found either in the GDT or in the currently active LDT. The 
selector used to identify the descriptor determines which table to use. The location 
of the tables in memory is determined by the GDTR, IDTR, and LDTR registers. 


Selectors 

A segment, as we have seen, is described by a descriptor that has been selected by 
a selector. A selector is made of three components, as shown in the following 
illustration: 


15 oe ae 0 


The INDEX and TI (table indicator bit) fields tell the CPU where to find the descrip- 
tor. When the TI bit is set to 0, the descriptor is in the GDT. When T1 is set to 1, the 
processor uses the current LDT instead. The INDEX field identifies which entry in 
the descriptor table to use. Be aware that the RPL (requested privilege level) can 
differ from the actual descriptor privilege level. The reason for this is discussed in 
detail in Chapter 5. 


3: Memory Architecture 


As an example of how the selection mechanism works, assume that the value 
1A3BH is a valid selector. The selector is divided as follows: 


Selector = 1A3BH INDEX = 0347H (839 decimal) 
0001101000111011B TI=0 (GDT) 
RPL = 3 Cowest) 
To use a selector, hardware must first break it into three fields: INDEX, TI, and RPL. 


Figure 3-10 illustrates how hardware separates a selector into its components. 


| 
i 
| 
me 
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(Current) 


GDT LDTs 


Figure 3-10. Hardware’s separation of selector components. 


Games Segments Play 


By making use of the virtual addressing capabilities, an operating system designer 
can provide a number of interesting features. One such feature is virtual memory. 
Virtual memory-gives the appearance of physical memory where none exists. 


To illustrate how this can be accomplished, imagine an environment such as the 
one pictured in Figure 3-11 on the following page. The figure represents a multitask- 
ing system in which four tasks are to be run. One MB of memory is available for 
running the four applications. Application A requires 400 KB, application B requires 
100 KB, application C requires 400 KB, and application D requires 200 KB. Also 
assume that half of the application space is dedicated to code and that the other half 
is required for data. 


Because the combined memory requirement of the four applications exceeds 1 MB, 
they cannot all be in memory simultaneously. After A, B, and C are loaded, not 
enough room remains for all of task D. (See Figure 3-12 on the following page.) The 
operating system loads the code portion of task D but not the data segment. It does, 
however, create descriptors for both the code and the data segments of task D, 
marking the data segment descriptor as not present. 
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System memory Applications 


100 KB $ 
400 KB A 
1 MB 
wel 
| 400 KB C 


Figure 3-11. Initial state of a multitasking system. 
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Figure 3-12. Initial tasks loaded into memory. 


This is a multitasking system, so the starting address (CS:EIP) of each task is passed 
to the scheduler portion of the operating system, and execution begins. Task A 
starts and is allowed to execute for a few milliseconds. The scheduler then takes 
control and allows task B to run for a few milliseconds. However, part way through 
its allotted time slice, task B reads the keyboard for input from the operator. Be- 
cause no keys have yet been pressed, the operating system takes control and marks 
task B as suspended. 


The scheduler then gives control to task C, which runs through its allotted execu- 
tion time. Control now passes to task D. It begins to execute, but as soon as it tries to 
refer to the data segment, the processor generates the not-present interrupt. 


The operating system determines which task was executing when the interrupt oc- 
curred and what caused the interrupt. It determines that task D needs access to its 
data segment, so it evaluates the status of the other tasks. Task B is suspended, so 
the operating system decides to temporarily remove it from memory to make room 
for the data segment of task D. 


The memory image of B is written to disk, and the descriptors for B are marked as 
not present. Task B is said to have been swapped out, and operating systems that 
implement virtual memory in a similar manner are implementing swapping. 
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The data segment for D is copied into memory at the physical location just vacated 
by B, and the descriptor for D is updated to reflect the new base address and to 
show that the segment is now present in memory. Figure 3-13 reflects the new state 
of the system. 


Descriptor table 


Disk 


| 


Figure 3-13. Swapping tasks B and D. 


The scheduler now rotates execution time among tasks A, C, and D. At some point 
the computer operator sees the prompt for input from task B and in response 
presses a key on the keyboard. This action causes a hardware interrupt, and the 
operating system realizes that it must now schedule task B. However, because none 
of the other tasks are suspended, the system might choose to suspend task A 
temporarily. 


Because task B is small, it displaces only part of task A. The code segment of task A 
is marked as not-present, task B is swapped in, and the descriptors for tasks A and B 
are updated as shown in Figure 3-14 on the following page. Notice that task B is now 
running at a different physical address than when it began. This is invisible to the 
application, however, because the selectors loaded into the segment registers do not 
change and because the memory offsets used by the instructions in the code seg- 
ment are relative to the starting point of the segment, regardless of the physical 
origin of the segment. | 


The system will continue to operate as previously described, with occasional swap- 
ping and shifting of segments. If no external condition exists that causes a segment 
to swap, the operating system might swap segments, based either on which tasks 
have run the longest or on another system of priority. 
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Figure 3-14. Swapping tasks A and B. 


Performance considerations 


As the previous example shows, virtual memory doesn’t create RAM out of thin air; 
it uses secondary storage, usually disk, to supplement the primary (RAM) storage 
and give the appearance of more primary storage than exists in the system. The 
cost of keeping up appearances is the amount of time it takes to move data between 
primary and secondary storage. The more time the system has to spend swapping, 
the less time it can spend executing the applications. In extreme cases, a system can 
be so overextended that it spends all its time swapping segments in and out. This 
pathological situation is called thrashing. 


An operating system designer can improve the performance of a virtual memory 
system. For example, in the Intel protection mechanism, code segments are immu- 
table. Because the contents of a code segment do not change, it doesn’t have to be 
written to disk when swapped out. You can re-create the contents from the original 
executable image of the program. Only swapping in requires access to secondary 
memory. The operating system, therefore, can swap code segments twice as fast as 
it can swap data segments. Actually, if you recall the contents of a descriptor, you 
will remember that certain kinds of data segments can be marked as read-only. As 
with code segments, read-only data segments do not have to be written to second- 
ary storage when swapped out. 


Another trick that designers can use also relies on knowledge about code segments. 
The technique of segment sharing lets two or more tasks share the same code. This 
is primarily effective in multiuser systems. In the previous example, assume that 
tasks A, B, C, and D represent users running applications. Suppose that users A and 
C are running the same application, perhaps a spreadsheet. Now users A and C are 
operating on different data and require separate data segments. They are, however, 
executing the same code. Figure 3-15 shows how all four applications can fit in 
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physical memory in this situation. The users maintain separate descriptors for their 
code and data, but the base addresses for the code segments of A and C point to the 
same location. 


Descriptor table 


B data 
C code 
Eanes 


ites 


Figure 3-15. Tasks A, B, C, and D in physical memory. 


Finally, a segment-oriented virtual memory system can provide a way to compact 
memory. Compacting memory helps solve a problem called fragmentation. Frag- 
mentation occurs when memory that is not contiguous is available to run additional 
applications. To put it another way, the pieces of available memory are small and 
scattered throughout physical memory, and to be useful they need to be next to one 
another. Figure 3-16 illustrates this problem. Because applications deal with virtual 
addresses, they are not affected by a change in location. The process does take up 
CPU time, however. 


Memory after 
Memory compaction 

Segment 5 KB | | 20 KB 
20 KB 
in use 
- - _ _ 
memory 

300 KB New seg- New 

ment to be} 100 KB Seomenc | L008 
15 KB swapped in 8 
100 KB free memory 


Figure 3-16. Memory fragmentation. 
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Why bother? 


Because virtual memory is plagued with potential performance problems and adds 
to the complexity of operating systems by forcing them to deal with fragmentation 
and with identifying shareable segments, you might be tempted to ask, “Is it worth 
the effort?” In most cases, the answer is “Yes.” 


One clear advantage of virtual memory is that a user doesn’t have to spend money 
for extra memory simply to get an application to run. Any application will run in 
existing memory; it will simply run more slowly if it has to be swapped out. Let’s say 
that I have a system with 2 MB of physical memory and that 90 percent of my appli- 
cations fit into physical memory. However, 10 percent of the time I run an applica- 
tion that requires 5 MB of memory. Without virtual memory, I can’t run the large 
application unless I spend the extra money to buy 3 MB of memory that will remain 
unused 90 percent of the time. With virtual memory, I can at least run the applica- 
tion and decide whether I want to spend money to improve its performance. 


Virtual memory also makes life easier for the application designer. What if you are 
writing a program that manipulates a large array? If virtual memory is not available, 
you have to worry about how much memory your typical user will have and how to 
make your program fit into a system of that size. As a designer, you can no longer 
worry about simply solving the problem at hand (the array manipulation). You must 
also be concerned about breaking your program into pieces that will fit on the typi- 
cal system. The complexity of your application increases, and the application is thus 
more likely to contain bugs. 


This situation might be likened to giving a speech simultaneously in two different 
languages. By letting someone else handle the translation, you can concentrate on 
your job—presenting your information. 


The dark side of the force 


So far, only the advantages of segmentation have been discussed. Let’s take another 
look at segments and see if we can uncover some problem areas. One advantage of 
segmentation is virtual addressing. The application deals with selectors, whereas 
the linear memory address for the segment is in the descriptor. Thus, every time a 
selector is loaded into a segment register, the contents of the descriptor must be 
fetched as well. Every instruction that causes a segment register to be loaded also 
causes the 8-byte descriptor for the segment to load. In addition, the descriptor is 
marked as accessed when it is loaded, so a memory write is required to set the bit in 
the descriptor. | 


At a minimum, therefore, a segment register load has an overhead of two memory 
read cycles and one memory write cycle in addition to any memory cycles required 
to fetch the operand of the load instruction. Because of this and the protection 
checking that the CPU does based on the type of segment, size of descriptor table, 
and privilege level, loading a segment register can take as long as 22 clocks as op- 
posed to the typical 2 clocks that it takes to load a general-purpose register. 


64 


3: Memory Architecture 


Another advantage of segmentation is the limit checking that the processor per- 
forms. If a data object such as an array is placed in its own segment, the CPU moni- 
tors all references to the object and triggers an interrupt if any instruction refers to a 
point beyond the bounds of the object. Limit checking is an excellent tool for help- 
ing programmers discover flaws in their programs. Unfortunately, using this tool 
means having many data segments. Having many data segments implies many seg- 
ment register load operations, which slow down the program. You must also deal 
with 48-bit pointers—16 bits of selector and 32 bits of offset. 


The 80386 and 80486 do not provide many instructions for handling these ir- 
regularly sized items, nor do many programming languages. Consequently, they are 
awkward to manipulate, and they cause more work for the programmer. 


Finally, you must deal with the problem of fragmentation. Because segments come 
in odd sizes, the operating system must work harder to arrange physical memory 
space in which to load applications. 


Summary 


As you have seen, segmentation is a mixed blessing. On the one hand, it provides a 
method for implementing virtual memory and a mechanism for implementing a 
secure operating system via privilege levels, and the segment limits assist program- 
mers in tracking bugs that arise from invalid pointers or array boundary errors. On 
the other hand, segmentation gives rise to unwieldy 48-bit pointers, extracts a per- 
formance penalty, and can cause fragmentation when used to implement virtual 
memory. 


The flexibility of the 80386 family offers system designers three choices. You can 
ignore segmentation completely by creating only one code segment and one data 
segment that encompass the entire address space. You can use a limited form of 
segmentation in which only two segments, code and data, exist for every user or 
task on the system. In this instance, the application sees a uniform address space, 
and only the operating system needs to deal with segments. Or you can implement 
a fully segmented system in which each large data object and each module of code 
is in a separate segment. 


Each implementation has advantages. The first method gives you an architecture 
similar to the M68000 or VAX. Although it might seem that you lose the capability to 
implement virtual memory with this method, you can implement a form of virtual 
memory other than the one described here by using paging, which is discussed in 
Chapter 6. A system of this design, however, loses the privilege-level protection fea- 
tures provided by segmentation. 


The second method strikes a balance between the other two. Protection is provided 
on a task-by-task basis, and virtual memory can be implemented through segmenta- 
tion, paging, or both. 
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The third method is the most similar to that provided by OS/2 on the 80286 and to 
programming in the large memory model. This type of system can provide a very 
secure environment, but the system will run somewhat slower. 


One beauty of the 80386 family is that it supports these divergent environments and 
allows designers to build systems that meet their needs, whether those needs be for 
high security or for high performance. 


66 


4 


THE BASIC 
INSTRUCTION 
SET 


The 80386 family of processors are classic stored program, or von Neumann, pro- 
cessors—that is, the memory attached to the CPU stores not only data to be oper- 
ated on but the instructions that specify the operations. The term von Neumann 

is used in honor of the mathematician John von Neumann, who wrote a series of 
papers in the mid-1940s outlining the design of stored program computers. Almost 
all commercially available computers have designs based on the von Neumann 
model, and those using the 80386 and the 80486 microprocessors are no exception. 


Built into every stored program computer is a set of commands that cause the CPU 
to read from a location in memory, interpret the contents as an instruction (that is, 
as a command to perform some function), execute the function, and start the cycle 
over again. Because this sequence is often implemented in microcode, it is com- 
monly referred to as the microcycle. 


In one of the earliest stored program computers, the EDVAC, each machine instruc- 
tion was broken down into five fields: A bit pattern in one field designated the 
operation to be performed, two fields designated input operands, one field speci- 
fied where the result was to be stored, and the final field specified the location of 
the next instruction. Computer designers soon learned that if they placed one in- 
struction after another, they could eliminate the field that specified the address of 
the next instruction. A register called the program counter or instruction pointer 
was used to point to the next instruction and was incremented to point to the next 
one as soon as each instruction was fetched. 
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This method has never been modified, and the typical microcycle can now be 
expressed algorithmically like this: 


top: 
fetch the instruction at EJP 
increment EIP by the size Gin bytes) of the instruction 
execute the instruction 
— goto top 


This is, of course, a simple view of the microcycle. In actuality, it is much more 
complex because of the parallelism built into the 80386 family (see Chapter 1) and 
because of the necessity of saving the state of the processor if an instruction faults 
and has to be restarted. However, the basic algorithm is all that is necessary to un- 
derstand the process. 


Instruction Format 


Instructions are stored in memory in the same way that characters, floating-point 
numbers, integers, or any other type of data is stored in memory. The value OF5H, 
for example, is the encoding for the CMC (complement carry flag) instruction. An 
instruction can range from 1 byte to 16 bytes in length. 


In general, the format of an 80386 or 80486 instruction looks like this: 


The opcode is 1 or 2 bytes. The mod r/m and s-i-b bytes specify the operands and 
memory addressing modes. The displ (displacement) field is part of the memory 
address and can be 1, 2, or 4 bytes. The data field specifies an immediate operand 
value and can also be 1, 2, or 4 bytes. As many as four prefix-bytes can precede the 
opcode field. 


Not all fields are present in all instructions. The CMC instruction, as shown pre- 
viously, consists of only a single opcode byte. The instruction: 


XCHG EAX, EBX 


consists of only the opcode and mod r/m fields. All fields are present in the 
instruction: | 


ADD [EBP+8][ESI+#4], 17 


Appendix D specifies the bit patterns used to encode instructions, and Appendix E 
contains a table that lets you decode bit patterns into the original assembly language 
mnemonics. 
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Instruction Operands 


The instructions stored in memory command the CPU to manipulate one or more 
operands. Instruction operands can be specified in one of five ways: They can be 
implicit, register, immediate, I/O, or memory reference operands. 


Implicit operands 


An operand is implicit if the instruction itself specifies it. The CLI instruction, for 
example, operates on the IF bit in the EFLAGS register. The programmer does not 
have to specify anything beyond the instruction. The stack is an implicit operand in 
a number of instructions—for example, PUSH, POP, CALL, and IRET. However, be- 
cause the stack resides in memory, I will discuss stack operands in the section on 
memory reference operands. The following are examples of instructions that have 
implicit operands. 


Instruction Explanation 
AAA Adjust register AL after ASCII add 
CMC Complement the value of the carry flag 
CLD Clear direction flag to 0 
Register operands 


An instruction with a register operand performs an action on the value that is stored 
in one of the internal registers (shown in Figure 4-1 on the following page). Specify 
register operands by using the name of the register in the operand field of the in- 
struction. Notice that not all registers are legal operands for all instructions. The 
general registers (EAX, CL, and so on) are most commonly used in data manipula- 
tion instructions. You cannot, for example, increment the contents of a segment 
register or use a control or debug register to store a memory address. 


The following examples illustrate typical instructions using register operands. 


Instruction Explanation 

INC ESI Add 1 to contents of ESI 

SUB ECX, ECX Subtract ECX from itself, leaving 0 

MOV AL, DL Copy contents of DL into AL 

MOV EAX, CRO Copy CRO contents into EAX | 
CALL EDI ~ Invoke subroutine whose address is in EDI 
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General registers Segment registers 
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31 


EFLAGS 
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Figure 4-1. 80386/80486 register set. 


immediate operands 


An immediate operand is specified when a value is part of the instruction itself. 
Consider the instruction ADD EAX, 3. In addition to the register operand EAX, the 
numeric value 3 is coded in the instruction and is stored in the code segment with 
the bit pattern that represents ADD. Other examples of instructions that use im- 
mediate operands include: 


Instruction Explanation 

MOV EAX, 7 Store the value 7 in register EAX 

AND CL, OFOH Mask off the low-order bits of CL 

BT EDI, 3 Copy bit 3 of EDI to carry flag 

JC 3C1H Branch to offset 3C1H if CF is set 
/O operands 


External devices that transfer data from the computer to another environment are 
called I/O Gnput/output) devices. Typically, a processor communicates with these 
devices via a special address. The most straightforward way is for the device to have 
its own address (or set of addresses) called /O ports. I/O addressing is similar to 
memory addressing, but different hardware control lines are activated. In addition, 
the processor sensibly refrains from attempting to cache values read from or written 
to I/O ports. The 80386 and 80486 each support a total of 65,536 separate I/O 
addresses. 
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I/O communication is done in 8-bit, 16-bit, or 32-bit quantities. I/O addresses must 
be aligned on even boundaries for word I/O and mod 4 boundaries for doubleword 
I/O. The accumulator is always the source or the destination for the I/O instruction, 
and the I/O port is specified with an immediate operand or by the contents of the 
DX register. Notice that I/O ports expressed as immediate operands cannot exceed 
8 bits, or a value of OFFH. Examples of instructions that use I/O operands include: 


Instruction Explanation 

IN AL, 04H Input a byte from port 04H 

OUT 1CH, AX Output a word to port 1CH 

IN AX, DX Input a word from port specified by DX 

IN EAX, DX Input a doubleword from port specified by DX 


Memory reference operands 


To operate on the contents of memory, you must specify the address of the data 
value you want to use. The 80386 family provides a number of addressing modes. 
There is rarely a performance penalty for using a complex addressing mode, so use 
the addressing mode that is most appropriate to your program’s needs. 


When you specify a memory address, you specify the offset from the beginning of 
the appropriate segment. Address 0 is the first byte of the memory segment, address 
1 is the second byte, and so on, regardless of the segment’s physical starting address. 
Chapter 3 contains a detailed description of how segmentation is used to generate 
memory addresses. 


By default, the segment used in most instructions is the one pointed to by the DS 
register. Forcing an instruction to operate on values in other segments is possible, 
however, by programming a segment prefix opcode immediately before the instruc- 
tion. Normally, the instruction MOV AL, [0] reads the first byte of the data segment 
into register AL. By applying a segment prefix, you can force the data to be fetched 
from another segment. The instructions: 


SS: . 
MOV AL, [0] 


load the AL register with the first byte of the stack segment. Although the segment 
prefix byte comes before the instruction in the code stream, for readability the pre- 
fix is usually written as part of the memory operand. The previous example is nor- 
mally written: 


MOV AL, SS:[0] 
Direct addressing 
The simplest form of memory reference is called direct addressing, where the in- 


struction itself includes the location of the operand. The location is specified as 
a 16-bit or 32-bit offset in the current segment. This offset is also known as the 
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displacement. The table below shows three examples of direct addressing. The 
brackets differentiate data values (no brackets) and memory addresses (brackets). 


Instruction Explanation 

INC DWORD PTR [17H] Add 1 to the 32-bit value at offset 17 
MOV AL, [1A33D4H] Copy the memory byte to register AL 
SHL BYTE PTR[1FFHI, 3 Shift the memory byte left 3 bits 


In the examples in this chapter, I generally use numeric memory addresses to illus- 
trate where the address values are used in an instruction. You might never need to 
use numeric memory addresses. Your programming environment will provide as- 
semblers and compilers that name locations in memory, and you will use these 
names in your program. This technique is called symbolic addressing. 


Symbolic addressing has a number of advantages over absolute numeric addressing. 
You are much less likely to make a mistake if you can refer to a variable by a mne- 
monic name, such as gueue_top, rather than by a number, such as 32081A3H. Also, if 
you use symbolic names, the assembler keeps track of the type of the data item. For 
example, the opcode for the increment instruction is INC, but the same opcode can 
apply to 8-bit, 16-bit, or 32-bit operands. If you define a symbolic variable, the cor- 
rect instruction encoding is chosen for you. Without symbolic addressing, you must 
specify both the size and the location of the operand. For example, notice the differ- 
ence between these two operations: 


INC DWORD PTR [15F2H] ; 32-bit operand 

and 

COUNT DD ? ; allocate 32 bits with name COUNT 
INC COUNT ; increment variable 


Here are some additional examples of instructions that use symbolic addressing. 


Instruction Explanation 

COUNT DD © 10 Reserve 32-bit value, initial value 10 

FLAG DW ? Reserve a single word 

NAME DB~ 20 DUP () Reserve 20 consecutive bytes 
DEC COUNT Subtract 1 from the value at COUNT 
MOV AL, NAME Copy first byte of NAME 

g MOV AL, NAME[1] Copy second byte of NAME to AL 

OR FLAG, 4000H Set one bit in the specified word 


72 


4: The Basic instruction Set 


Based addressing 

In based addressing, a register holds the address of an operand. The register con- 
taining the memory address is called the base register, and you can use any of the 
seven general registers as a base register. When you use ESP or EBP as a base regis- 
ter, the address is assumed to be an offset from the stack segment (SS) rather than 
from the data segment (DS). You specify based addressing by placing the register 
name in brackets, as the following examples illustrate. 


MOV AL, [ECX] ; copy byte of memory at ECX into AL 

DEC WORD PTR [ESI] ; decrement 16-bit word at ESI 

XCHG EBX, [EBX] ; swap contents of EBX with dword at EBX 
CALL [EAX] ; EAX holds pointer to 


; address of subroutine 


Base plus displacement addressing 

Base plus displacement addressing is a variant of based addressing that uses a base 
register to specify a nearby location. An integer offset then modifies the base ad- 
dress to form the final destination. Base plus displacement addressing is commonly 
used in addressing components of data structures and in stack-relative addressing. 
For example, if ESI points to a record of type point, where point is a structure whose 
first element is the x coordinate and whose second element is the y coordinate, then 
you could use the instruction MOV EAX, [ESI+4] to fetch the coordinate. 


Similarly, because the base pointer EBP commonly points to the current stack frame, 
any values pushed onto the stack can be addressed by an offset from EBP. Offsets can 
be positive or negative and are interpreted as signed 32-bit integers. The assembler 
provides a construct called a struc that makes keeping track of offsets within data 
structures simple. Here is the above “point” data type example redone symbolically: 


POINT struc ; define record layout 
X DD ? 
Y DD ? 
POINT ends 
CORNER’ POINT<> ; reserve memory 
LEA ESI, CORNER ; get address of variable 
MOV EAX, [LESI].X ; fetch the x component 
INC LESI].Y ; increment the y component 


Index plus displacement addressing 

Indexing is implemented by using the contents of a register as a component of an 
address. Any of the seven general registers (except ESP) is a legal index register. In- 
dex plus displacement addressing is most useful in dealing with arrays. A direct ad- 
dress points to the starting address of the array, and the index specifies the element 
of the array. Here are three examples of index plus displacement addressing: 


MOV AL, 7ACHLEST] ; get byte of array based at 7AC w/index 
IMUL VECTOR[ECX ] ; multiply EAX by element indexed by ECX 
SUB ARRAY[LEAX], 2 ; subtract 2 from element of array 
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It might appear that index plus displacement is the same as base plus displacement. 
However, indexing offers a capability that based addressing does not have. 


The C language code fragment in the following example computes the sum of the 
squares of an array. 


int VLV_MAX]; 
register int i; 


sum = 0; 
for (i = 0; i < V_MAX; i++) 
sum += vi] * vLi]; 


Assuming that the size of an integer is 32 bits, two separate values are required to 
progress through the array: the index variable i and the offset in memory of VIi]. For 
example, when 7 is 3, the address of V[3] is the address of V plus 12 (4 x 3) bytes. Ev- 
ery time 7 is used as an index into the array, it must be multiplied by the size of the 
array element. The assembly code to execute the above loop might look like this: 


XOR ECX, ECX ; clear ECX (counter) to 0 
MOV SUM, ECX ; copy 0 to SUM 

Las CMP ECX, V_MAX ; is counter > or = V_MAX? 
JGE ae | ; yes - go on 
SL EAC. oye 
MOV 8° BAX). VEEAXT fa ad-cont: 
IMUL EAX ; square the array element 
ADD SUM, EAX ; compute the sum 
INC ECX ; bump the counter 
JMP Ll ; loop back to the top 


DONE: 


The highlighted code shows the conversion from array index to memory offset and 
the addressing of the selected item. 


The 80386 and the 80486 provide a special optimization for arrays whose elements 
are 1, 2, 4, or 8 bytes. The processor adjusts the index to produce a memory offset. 
This adjustment is called scaling and is indicated in assembly language by placing a 
multiply operation in the brackets that enclose the index register. The above ex- 
ample becomes: 


XOR ECX, ECX ; clear ECX (counter) to 0 


MOV SUM, ECX ; copy 0 to SUM 
Lis CMP ECX, V_MAX ; is counter > or = V_MAX? 
JGE DONE 3 yes - go on 
MOV | «EAX, .VEECX#4] °° ; Toad coritents of array into EAX. 
IMUL EAX | ; square the array element 
ADD SUM, EAX ; compute the sum 
INC ECX ; bump the counter 
JMP L1 ; loop back to the top 


DONE: 
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The second version of the program does not require the index value to be copied 
and multiplied, so the program runs faster. Also, the instruction: 


MOV EAX, V[LECX*4] 
takes no longer to execute than the instruction: 
MOV EAX, V[EAX] 


When EBP is used as a scaled index register, it does not force the memory reference 
relative to the stack segment as it does when used as a base register. When an in- 
struction specifies both a base register and an index register and one of them is 

EBP, EBP is assumed to be the base register unless a scale factor is present. If a scale 
factor exists, EBP is assumed to be the index register. The following table shows four 
examples: 


Instruction Explanation 

ADD [ECXI][EBP], 7 EBP is base, SS segment used 
MOV AX, ARRAY[EBP] EBP is base, SS segment used 
MOV EAX, [ECX][EBP*4] ECX is base, DS segment used. 
INC BYTE PTR [ECX+8][EBP].x EBP is base, SS segment used 


Unlike the 8086 and the 8088, which require anywhere from 5 through 17 clocks to 
compute the operand address (depending on the complexity of the operands), the 
80386 requires no additional time to compute the effective address unless both a 
base register and an index register are used to select the operand. When both 
registers select the operand, execution time increases by only one clock cycle. In 
the 80486, an additional clock might or might not be required, depending on how 
the instructions have been pipelined. However, in the 80486, 1 clock must be added 
to the execution times of instructions that use based addressing if the base register 
was loaded by the instruction immediately preceding the instruction that uses it. 


Base plus displacement plus index addressing 

Base plus displacement plus index addressing is the most complex addressing 
mode. This addressing form is used to address data structures stored on the stack or 
to address arrays whose base address is contained in a register. When these arrays 
are being addressed, the displacement value is 0 and the programmer need not 
specify it, although the assembler encodes a 0 displacement into the instruction. 
The index register can contain a scale value, as it does in index plus displacement 
addressing mode. Following are examples of base plus displacement plus index 
addressing: 
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Instruction Explanation 

MOV EAX; [EBP+8][ESI] Array is on stack beginning at EBP + 8 
INC WORD PTR [EBX+EAX*2] 16-bit vector based at EBX, with index 
MOV EDX, PTIEAX+8][ESI].Y Array of “point” data structures 


The final example above appears to contain two displacement values: the initial 
displacement that specifies the start of the array, and the displacement of structure 
element Y in the indexed array element. The assembler simply offers these values 
for clarity. In the machine instruction, the displacement field contains the sum of 
the two values, as calculated by the assembler. 


Stack-based addressing 

A stack is a data structure in which the alu most recently stored is the first value 
retrieved. The acronym LIFO (ast in, first out) describes the action of a stack and 
contrasts with the FIFO (first in, first out) structure. Figure 4-2 illustrates the LIFO . 
and FIFO structures. 


Queue — first in, first out (FIFO) 


Figure 4-2. LIFO, FIFO. 


Stack instructions typically refer to only a single operand. The other operand, the 
stack, is implicit in the instruction. The processor assumes that all memory in the 
stack segment (that is, the segment pointed to by the SS register) belongs to the 
stack, but this is not always true. Often, DS and SS point to the same segment; part 
of the segment contains program data, and part is reserved for the stack. In this 
situation, the programmer might need to write code to check for stack overflow, 
which occurs if too many items are pushed onto the stack and it runs over into the 
data area. 
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When a value is stored on the stack, or pushed, the ESP register is tested to see 
whether it is greater than or equal to 4. If it is not, a stack fault (interrupt 12) is gen- 
erated; otherwise, ESP is decremented by 4, and the operand is stored at SS:[ESP]. 
The most recently pushed value, to which register ESP always points, is called the 
top-of-stack. 


The POP operation retrieves the most recently pushed value from the stack. First, 
ESP is compared with the limit of the stack segment. If the memory reference is out- 
side the limit, a stack fault is generated; otherwise, the value at SS:[ESP] is read, and 
ESP is incremented by 4. 


The PUSH and POP instructions cause immediate values, register values, or the con- 
tents of a memory location to be stored to and retrieved from the stack. Also, some 
instructions that cause a transfer of control (change the EIP register) push the old 
execution address onto the stack. This allows the subroutine to return to the pre- 
vious point of execution. 


The most commonly used instruction that changes the EIP register is CALL. The 
CALL instruction has one operand, the address of a routine to be executed. The 
value of EIP (which points to the instruction immediately following the CALL) is 
pushed onto the stack, and EIP is set to the address specified by the CALL operand. 
The RET (or return) instruction pops the current top-of-stack into the EIP register, 
returning control to the instruction after the initial CALL. 


A routine passes information to another routine by storing values on the stack before 
executing a CALL instruction. The standard way this information is structured is 
called the frame of the called routine or the call stack. Figure 4-3 on the following 
page illustrates a subroutine call and shows how the stack frame is structured. 


Programs can push and pop 16-bit values by specifying registers AX, BX, SI, and so 
on or by specifying 16-bit memory references. It is more efficient, however, to push 
the contents of the 32-bit register (for example, EAX for AX) and to disregard the 
high-order bits. Use the MOVSX or MOVZX instruction to copy memory operands to 
a register and extend them to 32 bits before they are pushed onto the stack. The 
reason for doing this relates to how the 80386 and the 80486 interface with memory. 
If the physical memory address is a multiple of 4 (that is, if the address is on a dword 
boundary), then a single memory reference cycle can fetch as many as 4 bytes. If 
the physical memory address is offset from the dword boundary, then at least two 
additional clock cycles are required to read or to write a 32-bit value. 


Therefore, after executing a 16-bit push, all subsequent 32-bit stack references 
degrade in performance by at least 30 percent. In addition, if a 32-bit program with 
16-bit pushes is ever run on the 80486 with alignment checking enabled, the pro- 
gram will generate an alignment fault. In protected mode, 80386 and 80486 gener- 
ate 32-bit references when the 16-bit segment registers (CS, SS, DS, ES, FS, and GS) 
are pushed or popped, so performance degradation is not an issue in this case. 
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| Stack after 
Initial stack PUSH x 
38 38 
34 34 
30 30 
2G 2G 
28 28 
24 24 
20 20 
1C 1C 
18 18 
Stack after Stack after 
CALL subr LEAVE 
subr: ENTER 8 RET 4 
38 
34 
30 
2G 
28 
24 
20 
1C 
18 


Figure 4-3. Use of the stack. 


Instruction Categories 


The operations that can be performed vary widely, reflecting both the wide range of 
the CPU’s capabilities and its compatibility with previous processors. In this section, 
I divide the instruction set into a number of related categories and identify the most 
important instructions of each category. 


Arithmetic 
Arithmetic instructions perform signed and unsigned integer operations on oper- 
ands of 8, 16, and 32 bits. With few exceptions, these instructions have the form: 


OPCODE dest, src 
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Generally, arithmetic instructions operate on source and destination operands and 
store the result in the location specified by the destination operand. The destination 
operand can be a memory reference or a register, and the source operand can be 
memory, a register, or an immediate data value. Both the source and the destination 
operands cannot be memory references, however. The instructions that fit this for- 
mat are: 


Instruction Explanation 

ADD Integer addition 
ADC Add with carry 

SUB Subtract 

SBB Subtract with borrow 
CMP Compare integers 


These instructions affect the AF, CF, OF, PF, SF, and ZF bits of the EFLAGS register, 
depending on the results of the operation. 


In addition to the double-operand (or dyadic) instructions, there are single-operand 
(or monadic) instructions: 


Instruction Explanation 
INC Increment by 1 
DEC Decrement by 1 


Each of these instructions takes a single operand, either a register or a memory ref- 
erence. These instructions also affect the same EFLAG bits, except that they do not 
modify the carry flag (CF). 


Finally, there are the irregular integer arithmetic instructions: 


Instruction Explanation 

DIV Unsigned divide 
IDIV Signed divide 
MUL Unsigned multiply 
IMUL Signed multiply 


The DIV, IDIV, and MUL instructions take a single source operand. The destination 
operand is implicitly the accumulator and depends on the size of the operands. 
Destination operands are defined as follows: 
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Operand Size Register 
8 bits AL 

16 bits AX 

32 bits EAX 

64 bits EDX,EAX 


Because of its usefulness in computing array and structure element offsets, the 
IMUL instruction has three different forms: 


Instruction Explanation 

IMUL src accum = accum X src 
IMUL dest, src dest = dest x src 
IMUL dest, src, data dest = src x data 


The DIV and IDIV instructions leave the status flags in undefined states. The MUL 
and IMUL instructions modify CF and OF, leaving SF, ZF, AF, and PF undefined. 


Decimal arithmetic 


Six instructions help implement decimal math routines. The standard integer in- 
structions perform computations, and the following instructions adjust the result 
because the operands are not integers but BCD encodings. The following instruc- 
tions have either the AL or the AX accumulator as an implicit operand: 


Instruction Explanation 

AAA ASCII adjust after addition 

AAD ASCII adjust before division 

AAM ASCII adjust after multiply 

AAS ASCII adjust after subtraction 

DAA Decimal adjust after addition 

DAS Decimal adjust after subtraction 
Logical 


The following instructions are called logical because they make no semantic 
assumptions about their operands—that is, they do not regard the operands as in- 
tegers, BCD digits, character strings, or so on. The instructions are strictly Boolean, 
or bit-by-bit, operations. First is a set of dyadic functions similar to the arithmetic 
instructions: 
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Instruction Explanation 

AND Boolean AND 

OR Boolean OR 

XOR Exclusive OR 

TEST Performs an AND but modifies only the EFLAGS register 


A single monadic instruction, NOT, performs a logical complement of the operand. 
With the exception of NOT, the logical instructions modify each of the OF, SF, ZF, 
PF, and CF flags according to the outcome of the operation. The AF flag is left 
undefined. 


A series of instructions operates on bit strings. These instructions have the form: 
OPCODE dest, index 


where dest selects a bit string, either in memory or in a register, and index identifies 
the particular bit in the bit string that is the subject of the operation. The index 
value is either contained in a register or specified as an immediate value. If dest is a 
memory location, index is treated as a signed integer and can take on any value 
from —23! through +231, Instructions that operate on bit strings are BT, BTC, BTR, 
and BTS. 


Instruction Explanation 

BT Bit test (save the value of the selected bit in CF) 

BTC Bit test and complement (save bit, then complement dest bit) 
BTR Bit test and reset (save bit, then clear dest bit to 0) 

BTS Bit test and set (save bit, then set dest bit to 1) 


Figure 4-4 shows bit indexing in these instructions. 


Index= —26 


24 -16 8 -17 015 8 23 16 


poiooy Toff of 


2A8H 2A9H 2ZAAH 2ABH ZACH 
dest 
address 


Figure 4-4. Bit indexing in BT instructions. 


Two instructions search bit strings. These instructions have the form: 
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Instruction Explanation 
BSF dest, src Bit scan forward 
BSR dest, src Bit scan reverse 


where src indicates the location of a bit string. The dest operand must be a register 
that receives the index of the first nonzero bit. The dest operand can be only a 16- 
bit or 32-bit register and indicates whether the src operand is a 16-bit or 32-bit quan- 
tity. Figure 4-5 shows how these instructions work. 


BSF EAX, EAX 
31 EAX 0 
0100100111. . .001001000 | Bit scan forward 
| FAX Start 
Result: 
BSR EAX, EAX EAX | 
Ol... .... . 001001000 Bit scan reverse 
Start EAX 


Result: 


Figure 4-5. Bit scanning. 


The final logical instructions are shift and rotate instructions. Figure 4-6 illustrates 
what shift and rotate instructions do. 


CY  SHL cy o— SHR 


CY SAL CY SAR 


CY RCL 


Figure 4-6. Shift and rotate instructions. 
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Most of these instructions have the form: 
OPCODE dest, COUNT 


The destination is either a memory reference or a register. The COUNT is either an 
immediate value or the CL register. The following instructions fit this format: 


Instruction Explanation 

SHL Shift left logical 

SHR Shift right logical 

SAL Shift arithmetic left 

SAR Shift arithmetic right 

ROL Rotate left 

ROR Rotate right 

RCL Rotate through carry left 
RCR Rotate through carry right 


The following double shift instructions are also provided: 


Instruction Explanation 
SHLD dest, src, COUNT Shift left double 
SHRD dest, src, COUNT Shift right double 


In the above instructions, the source and the destination are concatenated and 
shifted, and the result is truncated and stored in the destination operand. Figure 4-7 
illustrates double shift instructions. 


Figure 4-7. Double shifts. 
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Data transfer 


Probably the most frequently used instructions are in the data transfer category. To 
the assembly programmer, a single instruction appears to do almost all the work. 
Actually, the MOV mnemonic is encoded into one of several opcodes, depending on 
the operands involved. The general form of the MOV instruction is: 


MOV dest, src 


Either the dest or the src operand can be a memory reference, but not both. Both 
operands can be registers, and the src operand can be an immediate value for most 
choices of dest. This instruction is not restricted to operating on the general regis- 
ters. The MOV instruction is the only instruction you can use to read or modify the 
control registers (CRO—CR3) and the debug and test registers C(DRO—DR7, TR6-TR7). 
You can also use the MOV instruction to load and store the segment registers DS, SS, 
ES, FS, and GS. 


Not all possible combinations of svc and dest are legal instructions. The restrictions 
are covered in Chapter 8. 


Here are five additional data transfer instructions: 


Instruction Explanation 

XCHG _ dest, src Exchange the contents of the two operands 
BSWAP reg Convert to other-endian (80486 only) 

MOVSX dest, src Move src into dest sign-extending src 

MOVZX_ dest, src Move src into dest zero-extending src 

SETcc dest Set dest to 0 or 1 depending on condition codes 


) 
The XCHG instruction takes two operands and swaps their contents. One operand 
must be a register; the other can be a register or a memory reference. Because this 
instruction is frequently used to implement semaphores, the hardware bus LOCK 
signal is asserted whenever one of the operands is a memory reference. 


The BSWAP instruction operates on a 32-bit register and swaps byte 0 with byte 3 
and byte 2 with byte 1. This will convert a “big-endian” number to “little-endian” 
format, and vice versa. 


The MOVSX and MOVZX instructions are similar to MOV, but they take a src operand 
of a single byte and either sign-extend it (MOVSX) or zero-extend it (MOVZX) into a 
16-bit or 32-bit integer at the dest location. If src is a word, it is extended appropriately 
to a doubleword. 


SET cc instructions move a 0 or a 1 into the destination, depending on the value of 
the condition codes in the EFLAGS register. The conditions supported are: 
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Instruction Explanation 
SETA dest Set to 1 if above (unsigned x > y) / CF = 0 & ZF = 0 
SETAE dest Set to 1 if above or equal / CF = 0 
SETB dest Set to 1 if below (unsigned x < y) / CF = 1 
SETBE dest Set to 1 if below or equal / CF=1| ZF=1 
SETC dest Set to 1 if carry / CF = 1 
SETE dest Set to 1 if equal / ZF = 1 
SETG dest Set to 1 if greater (signed x > y) / SF = OF & ZF = 0 
SETGE dest Set to 1 if greater or equal / SF = OF 
SETL dest Set to 1 if less (signed x < y) / SF != OF 
SETLE dest Set to 1 if less or equal / SE != OF or ZF = 1 
SETNA dest Set to 1 if not above (SETBE) 
SETNAE dest Set to 1 if not above or equal (SETB) 
SETNB dest Set to 1 if not below (SETAE) 
SETNBE dest Set to 1 if not below or equal (SETA) 
SETNC dest Set to 1 if no carry / CF = 0 
SETNE dest Set to 1 if not equal / ZF = 0 
SETNG dest Set to 1 if not greater (SETLE) 
SETNGE dest Set to 1 if not greater or equal (SETL) 
SETNL = dest Set to 1 if not less GETGE) 
SETNLE dest Set to 1 if not less or equal / SF = OF & ZF = 0 Set G) 
SETNO dest Set to 1 if no overflow / OF = 0 
SETNP dest Set to 1 if no parity / PF = 0 
SETNS _ dest Set to 1 if no sign / SF = 0 
SETNZ dest Set to 1 if not 0 / ZF = 0 
SETO dest Set to 1 if overflow / OF = 1 
SETP dest Set to 1 if parity / PF = 1 
SETPE dest Set to 1 if parity even / PF = 1 
SETPO dest Set to 1 if parity odd / PF = 0 
SETS dest Set to 1 if sign / SF = 1 
SETZ — dest Set to 1 if 0/ ZF = 1 

Stack 


The stack instructions store and retrieve data from the stack. The PUSH instruction 
writes its operand to the stack, and the POP instruction removes the top-of-stack 
element and stores it in the location specified by its operand. 


The PUSHAD and POPAD instructions require no operands and save or restore all 
the general registers to the stack. Figure 4-8 on the following page shows the stack 
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after a PUSHAD has been executed. Although PUSHAD stores the value of the ESP 
register, POPAD does not reload ESP from the saved image. The new ESP value is al- 
ways the old ESP value plus the number of bytes required to store the general regis- 
ter context. 


Before PUSHAD After PUSHAD After POPAD 


ECX 
EDX 
EBX 
EBP 
ESI 
EDI 


High memory 
ESP old ESP 


ESP 


ESP 


Low memory a 


Figure 4-8. PUSHAD context. 


Control transfer 


Control transfer instructions affect the flow of execution. Normally, an instruction is 
fetched from the address held in the EIP register, and then EIP is incremented by 
the size of the instruction so that it points to the next instruction. The new opcode 
is fetched, and the cycle continues. 


The 80386 supports branch instructions, which alter EIP, and subroutine call in- 
structions, which save the old EIP and then modify the EIP register. The software 
interrupt instruction is similar to the subroutine call except that an interrupt number 
is specified rather than an address. The address of the destination routine is then 
determined by a gate in the IDT. Figure 4-9 shows how JMP and CALL instructions 
affect the flow of execution. 


Branch instructions exist in both conditional and unconditional forms. Uncondi- 
tional jumps occur immediately when the appropriate instruction is encountered. 
All calls and software interrupts are unconditional. 


Conditional branches test certain bits in the EFLAGS register to determine whether 
to branch or not. These bits are usually set as the result of a compare instruction 
(CMP) or as the result of an arithmetic or a logical operation. These branches are to 
relative addresses, the offset is a + displacement from the current EIP. The following 
list shows the conditions that can be tested for and the mnemonic for each 
instruction. | ; 
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Flow of instructions 


JMP 


CALL/RET 


Figure 4-9. JMP and CALL instructions. 


Instruction 

JA offset 
JAE offset 
JB offset 
JBE offset 
JC offset 


JCXZ = offset 
JECXZ offset 


JE offset 
JG offset 
JGE offset 
JL offset 
JLE offset 
JNA offset 
JNAE offset 
JNB offset 
JNBE offset 
JNC Offset 
JNE offset 
JNG offset 


JNGE offset 


Explanation 


Jump above (unsigned x > y) / CF = 0 & ZF = 0 
Jump above or equal / CF = 0 

Jump below (unsigned x < y) / CF = 1 

Jump below or equal / CF =1| ZF=1 

Jump if carry / CF = 1 

Jump if CX = 0 

Jump if ECX = 0 


~ Jump equal / ZF = 1 


Jump greater (signed x > y) / SF = OF & ZF = 0 
Jump greater or equal / SF = OF 

Jump less (signed x < y) / SF != OF 

Jump less or equal / SF != OF or ZF = 1 

Jump not above (JBE) 

Jump not above or equal (JB) 

Jump not below (JAE) 

Jump not below or equal (JA) 

Jump no carry / CF = 0 


_ Jump not equal / ZF = 0 


Jump not greater SF != OF or ZF = 1 
Jump not greater or equal (JL) 


(continued) 
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continued 

Instruction Explanation 

JNL offset Jump not less (JGE) 

JNLE offset Jump not less or equal (JG) 
JNO offset Jump no overflow / OF = 0 
JNP offset Jump no parity / PF = 0 


JNS offset Jump no sign / SF = 0 
JNZ offset Jump not 0 / ZF = 0 

jo offset Jump if overflow / OF = 1 
jp offset Jump if parity / PF = 1 
JPE offset Jump parity even / PF = 1 
JPO offset Jump parity odd / PF = 0 
Js offset Jump if sign / SF = 1 

JZ offset Jump if 0 / ZF = 1 


Three other conditional branch instructions are the loop instructions. Loop instruc- 
tions decrement the ECX register and branch if the conditions outlined in the fol- 
lowing list are met. 


Instruction Explanation 

LOOP offset Decrement, branch if ECX != 0 

LOOPZ offset Decrement, branch if ECX != 0 and ZF = 1 
LOOPNZ offset Decrement, branch if ECX != 0 and ZF = 0 


LOOPE and LOOPNE are synonyms for LOOPZ and LOOPNZ. 


String 


String instructions handle large blocks of memory with ease. A string instruction 
can move a block from one location in memory to another, compare one block with 
another, or search a string for a specific value. String instructions use specific regis- 
ters for storing operands. DS and ESI always point to the source memory block. ES 
and EDI point to the destination. These pointers are incremented (or decremented) 
by the size of the operand (1, 2, or 4 bytes) every time the string instruction 
executes. 


The direction flag (DF) determines whether the source and the destination pointers 
are incremented or decremented. When the direction flag is 0, the addresses are in- 
cremented. When the flag is 1, addresses are decremented. The string instructions 
provide the following capabilities: 
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Instruction Explanation 

MOVS Move string—copy string at DS:ESI to ES:EDI 
CMPS Compare string—compare DS:ESI to ES:EDI 
STOS Store the accumulator at ES:EDI 

LODS Load the accumulator with DS:ESI 

SCAS Scan string, compare ES:EDI with accumulator 


You can execute any of these instructions repeatedly by placing a count value in the 
ECX register and preceding the string instruction with the REP prefix. The compare 
and scan instructions, which modify the flag bits, can also be prefixed by the REPE 
(repeat while equal) and REPNE (repeat while not equal) instructions, allowing fast 
compare and search operations. 


Pointer manipulation 


Pointer manipulation instructions load a 48-bit pointer into any pair of the segment 
and general registers. The format of these instructions is: 


Lxx reg, mem 


where xx stands for the segment register (SS, DS, ES, FS, or GS), reg is any general 
register, and mem is a memory operand. 


The LEA (load effective address) instruction computes 32-bit addresses. LEA loads a 
32-bit register with the address defined by the memory operand, which is unusual 
because other instructions operate on the value stored at the memory operand loca- 
tion. The following example shows how to use the LEA instruction to compute a 


pointer: 
VECTOR ODD 20 DUP (?) ; array of 20 elements 
MOV EAX, 9 ; array index 
LEA EAX, VECTORLEAX*4] ; get pointer to 9th array element 
PUSH EAX ; push pointer on stack 
CALL MYSUBR ; invoke subroutine 


Because the LEA instruction essentially performs only additions and shifts on the 
values of the displacement and the base and index registers, it can perform simple 
multiplications faster than the hardware multiply instructions can. For a value stored 
in a general register (such as EAX in the sample operations), the following opera- 
tions can be performed: 
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Instruction Explanation 

LEA EAX, [EAX*2] Multiply by 2 Gindex) 

LEA EAX, [EAX+EAX*2] Multiply by 3 (base + index) 
LEA EAX, [EAX*4] Multiply by 4 Gndex) 

LEA EAX, [EAX+EAX+4] Multiply by 5 (base + index) 
LEA EAX, [EAX*8] Multiply by 8 Gindex) 


LEA EAX, [EAX+EAX*8] Multiply by 9 (base + index) 


Using the LEA instruction in this way does not affect the flags. You cannot tell when 
arithmetic overflow has occurred, when the result is 0, and so on. Use LEA only to 
compute addresses such as array or structure indexes where overflow is not likely 
to occur. You can also view the LEA instruction as an addition instruction with four 
operands instead of two. The content of the index register is added to the base 
register and the displacement. By treating the displacement simply as a constant, 
the following formula expresses the action of LEA: 


dest reg <— index reg + base reg + const 


For example, the result of the LEA ECX, [EAX][ESI][3] instruction is equivalent to the 
following operations: 


MOV ECX, EAX 

ADD ECX, ESI 

ADD ECX, 3 
Input/Output 


Because I/O ports are usually connected to system devices, it is important to protect 
against indiscriminate access to them. Secure system routines that run with I/O 
privilege (CPLSIOPL) can execute any I/O instruction. A less privileged task can 
execute an I/O instruction; however, a general protection fault Gnterrupt 13) will 
occur unless the operating system has granted the task permission to access the 
specific port(s). The operating system grants permission by setting the appropriate 
bits in the I/O permission bitmap of the task’s TSS (task state segment). 


Both the input and output instructions have three forms. The simplest form is: 


IN acc, port 
OUT port, acc 


where acc is one of the accumulator registers (AL, AX, or EAX) and port is a value 
from 0 to OFFH. These instructions can be used to address only the first 256 I/O ad- 
dresses, and the 80386 supports as many as 65,536 I/O ports. To access the entire 
range, use the following form of the instructions: 


IN acc, DX 
OUT DX, acc 


In the above instructions, the I/O address is contained in the DX register. 
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String instructions are the third type of I/O instructions. INS (input string) takes in- 
put from the port specified by DX and stores the result at ES:EDI, adjusting EDI ac- 
cording to the direction flag bit. OUTS (output string) reads the value at DS:ESI and 
writes it to the port specified by DX. INS and OUTS can be prefixed by the REP in- 
struction, which causes the I/O instruction to repeat until ECX is decremented to 0. 


Prefix 


Prefix instructions precede other 80386 instructions. Prefixes modify the action of 
the instructions they precede. You can apply more than one prefix to an instruction. 


The most commonly used prefixes are the repeat prefixes, discussed previously 
with the string instructions. If a repeat prefix is applied to any instruction other 
than a string instruction, an undefined opcode fault (interrupt 6) occurs. The follow- 
ing table lists the repeat prefix instructions: | 


Instruction Explanation 
REP Repeat until ECX = 0 
REPE / REPZ Repeat until ECX = 0 or ZF = 0 


REPNE / REPNZ Repeat until ECX = 0 or ZF = 1 


You can apply a segment override prefix to almost any memory reference instruc- 
tion. Each of the six segment registers has a prefix instruction. The override forces 
the memory reference of the modified instruction to the segment specified by the 
prefix rather than to the default segment. The following table lists segment override 


prefixes: 

Prefix Explanation 

CS: Refer to the code segment 

SS: Refer to the stack segment 
DS: Refer to the data segment 

ES: Refer to the segment pointed to by ES 
FS: Refer to the segment pointed to by FS 
GS: Refer to the segment pointed to by GS 


For example, the instruction MOV EAX, [42H] copies the dword at offset 42H of the 
data segment into EAX. When the instruction is prefixed with SS:, the dword is read 
from the stack segment. Most assemblers let you specify the prefix before the in- 
struction or as part of the instruction. For example: 

S94 

MOV EAX, [42H] 


91 


MICROSOFT’S 80386/80486 PROGRAMMING GUIDE 


or 
MOV EAX, SS:[42H] 


The only memory reference instructions that cannot be prefixed by a segment over- 
ride are SCAS, STOS, and INS. These are string instructions that operate on memory 
at ES:[EDI]. When a prefix instruction is applied to any other string instruction, it 
overrides the DS:[ESI] pointer only. The MOVS and CMPS string instructions have 
both a source (ESI) and a destination (EDD pointer and are allowed a single prefix 
instruction that overrides the DS:[ESI] pointer. 


You can apply the LOCK prefix to any of the following instructions when reading or 
modifying a memory location: 


ADC, ADD, AND, BT, BTC, BTR, BTS, DEC, INC, NEG, NOT, OR, 
SBB, SUB, XCHG, XOR © 


Notice that the XADD instruction is available only on the 80486. The LOCK prefix 
asserts the hardware signal LOCK\, which ensures exclusive access to a memory 
location in a multiprocessor environment. The assembler usually inserts two addi- 
tional prefix instructions, but Intel does not give them mnemonics. I call them OP- 
SIZ (operand size prefix) and ADRSIZ (address size prefix). OPSIZ toggles the 
operand word size of the processor for the next instruction. Normally, the machine 
word size is 32 bits. Prefixing a 32-bit instruction with OPSIZ converts it to a 16-bit 
instruction. Similarly, when code is run in 8086-compatible or 80286-compatible 
mode, the default machine word size is 16 bits; applying the OPSIZ prefix converts 
a 16-bit instruction to a 32-bit instruction. 


In real mode, virtual 8086 mode, and 80286-compatible mode, the byte 40H is inter- 
preted as INC AX, but in native (32-bit) mode, it is interpreted as INC EAX. To incre- 
ment the AX register in native mode, you must prefix the instruction byte with the 
OPSIZ instruction. The assembler does all the work, however. If you enter the in- 
struction INC AX in a native-mode code segment, the assembler generates the bytes 
66H and 40H. The following table illustrates the bytes that the assembler generates. 


Opcode Generation in Different Modes 


Native Mode Real, Virtual, or 80286-Compatible Mode 
INC AX —> 66H, 40H INC AX > 40H 
INC EAX —> 40H INC EAX — 66H, 40H 


Similarly, the ADRSIZ prefix toggles between 16-bit addressing and 32-bit address- 
ing. This prefix is useful for programmers writing 80386 code that will run under a 
16-bit operating system. In 16-bit mode (real, virtual, or 80286-compatible), memory 
offsets are limited to 16 bits, and more rules restrict which registers you can use as 
base and index values in generating addresses. These restrictions are listed in Ap- 
pendix D. The ADRSIZ toggle allows you to use the full addressing capabilities of 
the 80386 and 804860. 
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If you use 32-bit addressing under a 16-bit operating system, be consistent about 
register usage. For example, a programmer who wants to use the scaled index fea- 
ture in a program that runs under MS-DOS might code the following instruction 


sequence: 
; Increment each member of an array of 16-bit integers 
MOV CX, count ; get size of array | 

LAs INC array-2[LECX*2] ; increment array element 
LOOP L1 ; decrement index, branch if not 0 


These instructions would probably not work because the scaled address feature re- 
quires the full 32-bit ECX register and the programmer has loaded only the 16-bit CX 
register. The value of the high-order 16 bits in ECX is unknown. The correct ap- 


proach is: 
; increment each member of an array of 16-bit integers 
MOVZX ECX, count ; get array size, zero-extend into ECX 
Ll: INC array-2[LECX*2] 3; increment array element 
LOOP Ll ; decrement index, branch if not 0 
System 


Application programs do not execute system instructions. In some cases, system 
instructions cannot be executed unless the process has a high privilege level. The 
following table lists system instructions. More detailed information about these in- 
structions is given in Chapter 8. 


Instruction Explanation 
LGDT mem Load GDT base address and limit 
SGDT mem Store GDT base and limit 
LIDT mem Load IDT base address and limit 
SIDT mem Store IDT base and limit 
LTR SYC Load a selector into the task register 
STR dest Store the TR selector 
LLDT SYC Load a selector into the LDT register 
SLDT dest Store the LDT selector 
VERR dest Verify read access for dest selector 
VERW dest Verify write access for dest selector 
LAR reg, dest ‘Load access rights for dest selector 
-LSL reg, dest Load limit for dest segment 
ARPL dest, src Adjust privilege level for dest 
HLT Halt the CPU until reset or interrupt 
INVD Invalidate internal cache (80486 only) 
WBINVD Write back and invalidate internal cache (80486 only) 
INVLPG mem Invalidate the TLB (translation lookaside buffer) entry, which maps 


mem (80486 only) 
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Miscellaneous 


A few instructions don’t fit into any category. For example, the NOP instruction per- 
forms no operation. 


On the 80386, the WAIT instruction tests the hardware pin READY\. If the READY\ 
pin is not active, the CPU waits until it becomes active. If the 80386 is waiting, it 
continues to respond to hardware interrupts; however, it returns to the WAIT after 
the interrupt completes. The 80287 and 80387 coprocessors hold READY\ inactive 
while they perform floating-point operations. If your program might execute on the 
80386 or 80386SX, you should execute a WAIT instruction before you use the result 
of a floating-point computation to ensure that the coprocessor has finished execu- 
tion. The 80486 has no READY\ pin, and a WAIT is essentially a NOP. The WAIT 
does, however, cause the floating-point unit to check for unmasked exceptions that 
can result in a math interrupt. 


Floating-Point Extensions 


As discussed in Chapter 2, the 80387 NDP extends the instruction set of the 80386 
by providing hardware support for floating-point operations. In the 80486, the 
floating-point execution unit is contained on the same chip as the basic execution 
unit. The floating-point programming model is a stack-oriented model rather than 
the two-operand register/memory model of the basic execution unit. Most arith- 
metic instructions can be specified in three ways: with no operands, with a single 
operand, or with two operands. Following are some examples that illustrate the 
floating-point addition instructions. 


Instruction Explanation 

FADD No operands 

FADD ST@®) Single-stack operand 
FADD [EBP+6] Single-memory operand 


FADD ST(2), ST Two operands 
When no operands are specified, the operands are implicit. The following 
pseudocode illustrates what happens when no operand is specified: 


temp <- pop() 
ST <- ST <function> temp 


When a single operand is specified, the top-of-stack is implicitly the first operand, 
so the instruction becomes: 


ST <- ST <function> op 


When two operands are specified, both operands must be floating-point registers, 
and one must be the top-of-stack. You can store the result of the operation in either 
register, which you designate by making it the first operand. 


opl <- opl <function> op2 
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Several instructions have a form that discards the current top-of-stack after the func- 
tion is performed. A suffix of P (for pop) is added to the instruction mnemonic. For 
example, the instruction: 


FMULP ST(3), ST 


causes the top-of-stack and ST(3) to be multiplied and stores the result in ST(3). 
Then the top-of-stack is discarded, leaving the newly created value at ST(2). 


Load and store 


The load instructions push a new value onto the top of the floating-point stack, but 
the store instructions do not pop a value off unless explicitly indicated. Following 
are the relevant instructions: 


Instruction Explanation 

FBLD mem Push an 80-bit BCD integer 

FILD mem Push a 16-, 32-, or 64-bit integer 

FLD ST(n) Push a copy of a value already loaded 

FLD mem Push a 32-, 64-, or 80-bit real 

FLD1 Push 1.0 

FLDL2E Push log, e 

FLDL2T Push log, 10 

FLDLG2 Push log,, 2 

FLDLN2 Push log, 2 

FLDPI Push pi 

FLDZ Push 0.0 

FBSTP mem Store ST in an 80-bit packed BCD integer and pop (discard from stack) 
FIST mem Store ST in a 16- or 32-bit integer 

FISTP mem Store ST in a 16-, 32-, or 64-bit integer and pop 
FST ST(n) Store a copy of ST in ST(n) 

FST mem Store ST in a 32- or 64-bit real 

FSTP mem Store ST in a 32-, 64-, or 80-bit real and pop 


Because the floating-point execution unit operates in parallel with the basic execu- 
tion unit (via coprocessing on the 80386/80387 and internally in the 80486) and be- 
cause integer instructions generally execute more rapidly than floating-point oper- 
ations, issue a WAIT (or FWAIT) instruction before using the result of a floating- 
point store operation. This ensures that the value has been written to memory and 
that the 80386 code can access the value. 
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Arithmetic 


The following table lists the arithmetic operations that the FPU performs. See Chap- 
ter 8 for a description of the types of operands that each instruction supports. 


Instruction 

F2XM1 

FABS 

FADD fop(s)] 
FADDP opi, op2 
FIADD mem 
FCHS 

FCOM op 
FCOMP = op 
FCOMPP 

FICOM mem 
FICOMP mem 
FUCOM op 
FUCOMP op 
FUCOMPP op 
FCOS 

FDIV lop(s)] 
FDIVP op1, op2 
FIDIV mem. 
FDIVR lop(s)] 
FDIVRP = op 1, op2 
FIDIVR mem 
FMUL lop(s)] 
FMULP op1, op2 
FIMUL mem 
FPATAN 

FPREM 

FPREM1 

FPTAN 

FRNDINT 

FSCALE 

FSIN 

FSINCOS 


Explanation 


Compute 2ST—1 where —1 < ST <1 
Take absolute value of ST 

Add two floating-point numbers 

Add opi and op2, pop stack 

Add 16- or 32-bit integer to ST 
Change the sign of ST 

Compare ST with op (register or memory) 
Compare ST with op and pop 
Compare ST with ST), pop both 
Compare ST with 16- or 32-bit integer 
‘Compare with integer and pop 
Compare allowing quiet NaNs 

Like FCOMP 

Like FCOMPP 

Cosine of ST 

Floating-point divide 

Divide op1 by op2, pop 

Divide ST by 16- or 32-bit integer 
Reverse divide (op2 by op1) 

Reverse divide (op2 by op1) and pop 
Divide integer by ST 

Floating-point multiply 

Multiply op1 by op2 and pop stack 
Multiply ST by 16- or 32-bit integer 
Arctangent of ST(1)/ST, pop 

Partial remainder of ST/ST(1) 
Compute partial remainder to IEEE spec 
Compute tangent of ST, push(1.0) 
Round ST to integer 

Multiply ST by 2ST 

Compute sine of ST 

temp = ST, ST = sin(temp), push(cos(temp)) 


(continued) 
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continued 

Instruction Explanation 

FSQRT Take the square root of ST 

FSUB lop(s)/ Floating-point subtraction 

FSUBP op1, op2 Subtract op2 from op1 and pop 

FISUB mem Subtract 16- or 32-bit integer from ST 

FSUBR lop(s)/ Reverse subtraction 

FSUBRP = op 1, op2 Subtract op1 from op2 and pop stack 

FISUBR mem Subtract ST from 16- or 32-bit integer 

FTST Compare ST with 0.0 

FXAM Examine ST and set condition codes 

FXTRACT Decompose ST to exponent and significand, ST = exponent, push 

significand 

FYL2X ST(1) = ST(1) x login2ST, pop stack 

FYL2XP1 ST(1) = STC) x login2(ST + 1), pop stack 
Control 


Control instructions save or alter the state of the NDP. Some have a special “no 
wait” form, indicated by the letter N as the second character of the mnemonic. The 
“no wait” instructions execute without the implicit WAIT that occurs between two 
floating-point instructions. 


Normally a WAIT instruction is implied before every coprocessor operation. The 
following two instruction streams are equivalent. 


FADD ST(3), ST WAIT 

FMUL ST(1) FADD ST(3), ST 
WAIT 
FMUL =ST(1) 


WAIT causes the CPU to check whether unmasked exceptions have occurred. In the 
80387, this is done via the ERROR\ signal. In the 80486, the error state is maintained 
internal to the CPU. If a coprocessor error is signaled, a floating-point exception (in- 
terrupt 16) occurs. “No wait” instructions allow you to save the NDP state without 
worrying about processing any floating-point exceptions. 


The processor state of the FPU is held in the registers discussed in Chapter 3. Some 
of these registers are addressable individually, but others, such as the tag word and 
error pointer registers, are not. The combination of the control word, status word, 
and error pointers is called the environment. The environment layout is shown in 
Figure 4-10 on the following page. | 
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Address 
31 16 15 0 offset 


Figure 4-10. Environment layout. 


The following table lists the floating-point control instructions and their functions. 


Instruction Explanation 

F[NICLEX Clear all exception flags 

FDECSTP Decrement the TOP field in the CW 
FFREE ST(n) Mark ST(n) as unused 

FINCSTP Increment the control word TOP field 
F[NJINIT _ Initialize the NDP 

FLDCW mem Load the control word register 
FLDENV mem Load the floating-point environment 
FNOP No operation 

FRSTOR mem Reload the entire FPU machine state 
FINISAVE mem Store the entire FPU state in memory 
FINJISTCW mem Store the control word in memory 
FINISTENV mem Store the floating-point environment 
FINISTSW mem Store the status word 

F[NISTSW AX Copy the status word to register AX 


The entire state, including all registers, tags, and pointers, must be saved and 
restored when multitasking between two or more programs that rely on the FPU. 
The FSAVE and FRSTOR instructions load and save the memory image shown in 
Figure 4-11. 


The memory images described in Figure 4-11 are slightly different in a 80386 system 
using the 80287. See Appendix F for information pertaining to the 80287. 


Address 
31 0 offset 


36 
40 
44 
48 
52 
56 

STQ) 16. .47 60 
64 
68 
72 
76 
80 
84 
88 
92 
96 
100 
104 


Figure 4-11. FSAVE and FRSTOR memory layout. 
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THE 
PROTECTION 
MECHANISM 


The role of computers in society is becoming more and more significant. Computers 
process our financial transactions, count our votes at election time, control medical 
equipment, and more. As our dependency on computers grows, we need systems 
that can process multiple tasks and maintain reliability at the same time. 


In support of these goals, Intel designers implemented the protected virtual address 
mode (commonly, protected mode) on the 80286. Protected mode allows multiple 
applications to run concurrently but isolates them from one another so that failures 
in one application do not affect any other application. Although it was possible to 
implement multitasking on previous Intel microprocessors, every application had 
access to all portions of the system. A flaw in one application could easily crash the 
entire system or corrupt data associated with another task. 


The 80386 was the second Intel processor to support protected mode, and the 80486 
is the third. However, the basic mechanism is essentially unchanged from the 80286, 
except that it has been extended by use of 32-bit addressing. There is no difference 
between the 80386 and 80486 in regard to protection. This chapter discusses how 
the protection mechanism works, including privilege levels, task separation, and 
how virtual addressing is used to support the protection model. 


Selectors 


The central feature of the protection mechanism is the selector. Rather than directly 
accessing any part of the system, a program deals with a selector, which grants ac- 
cess to a system object. Associated with each object is information about it—for ex- 
ample, the object’s location, size, and type, and any restrictions on its use. 
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This information is not stored in the selector for two reasons. The selector would be 
very large, and passing it from routine to routine would take a lot of computer time. 

More importantly, keeping the object information in a separate location prevents an 
unscrupulously designed or errant program from corrupting the information. 


A selector is like a sealed envelope. Inside the envelope is important data that must 
be kept secure. Like a messenger permitted only to see envelopes and pass them to 
other messengers, a program can store and retrieve selectors and pass them to other 
routines. Only the operating system has access to the data inside the envelope, 
which is called a descriptor. 


Descriptors 


Aptly named, descriptors describe a system object in detail. Memory segments, as 
illustrated in Chapter 3, are one kind of system object. Other system objects include 
tables that support the protection mechanism, special segments that store the pro- 
cessor state, and access control objects called gates. 


Descriptors are grouped in descriptor tables. By examining a selector, the CPU de- 
termines which descriptor is associated with the selector and with the object to 
which the descriptor points. One item that the descriptor indicates is the privilege 
level of the object. This value is stored in the DPL field of the descriptor. When a 
program requests access to a system object with a selector, one of the following 
happens: 


m Access is denied. If the request violates a rule of the protection mechanism (more 
on this later), control passes from the program to a designated routine in the 
operating system. The operating system usually terminates the process. 


m Access is permitted but impossible to grant. For example, if the object is not cur- 
rently in memory, an operating system routine is called that swaps the object into 
memory and returns control to the program. The program is then permitted to 
retry access to the object. | 


m Access is granted at the requested privilege level. 


Privilege 


The protection mechanism supports four levels of increasing privilege, numbered 3; 
2, 1, and 0. Privilege level 0 is the most privileged level. 


The privilege level of the selector in the CS register identifies the precedence of the 
currently executing routine and is called the current privilege level (CPL). For reli- 
ability, only the most trustworthy and crash-resistant code in the operating system 
should run at the most privileged level (CPL = 0). Applications that might fail or 
compromise the integrity of the system should run at the least privileged level 

(CPL = 3). 
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Because the number of programs that can run at high privilege levels diminishes 
near level 0 and because level 0 code is likely to exist only in the core of the 
operating system, the classic illustration of the privilege system is one of concentric 
rings, as shown in Figure 5-1. 


Least secure 


Most secure 


Applications 


Figure 5-1. Privilege rings. 


The concentric ring image is so well integrated into the understanding of privilege 
that programmers often speak of code that runs “in ring 0” or “in ring 3”—another 
way of saying that the CPL of the procedure is 0 or 3. Every system object (that is, 
everything referred to by a descriptor) is associated with a privilege level and 
“resides” in a particular ring. 


The word privilege connotes rights or advantages not normally granted. On the 
80386 family, procedures running in the innermost rings can access data objects in 
the outer rings (which have less privilege), but outer-ring procedures cannot access 
objects with greater privilege. In addition, to prevent the operating system from 
crashing due to bad code, procedures cannot call other procedures that might be 
less reliable (procedures in outer rings). 


For example, a procedure running in ring 1 may access a data segment residing in 

ring 2 or ring 3 but is prevented from accessing a segment whose privilege level is 0. 
A ring 1 procedure, however, cannot invoke a subroutine residing in ring 2 or ring 3, 
nor can it call one in ring 0. Figure 5-2 on the following page illustrates this concept. 


An operating system does not need to support all four privilege levels. UNIX sys- 
tems, for example, typically implement only two levels, 0 and 3. OS/2 supports three 
levels: The operating system code runs in ring 0, applications run in ring 3, and spe- 
cial routines that need access to I/O devices run in ring 2. 
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Data 


[] Code (programs) 
——> Legal access 


eae > = Tllegal access 


Figure 5-2. Access between rings. 


Interlevel communication 


As a security measure, concentric rings of privilege work well, but the possibility 
exists that an application running in ring 3 might need service from the operating 
system. The operating system, however, omnipotent in ring 0, is not accessible to 
the application. The application, in effect, might say, “Oh most great and worthy of 
operating systems, please grant me, thy humble and obedient servant, additional 
RAM for my stack,” but because of the access restrictions, it has no way of calling on 
the operating system. 


Various cultures have established a priesthood whose job is to act as intermediator, 
but the Intel design engineers apparently despaired of fitting something that compli- 
cated into only a few hundred thousand transistors, so they resorted to something 
simpler. It’s called a gate. 


Gates 

A gate is a system object (that is, it has its own descriptor) that points to a procedure 
in a code segment, but the gate has a privilege level separate from that of the code 
segment. Figure 5-3 shows how this changes the legal subroutine call path. 


A gate allows execute-only access to a routine in an inner ring from a less privileged 
procedure. The restriction on outward calls, however, remains in force. The protec- 
tion mechanism supports four types of gates: call, interrupt, trap, and task. Call 
gates are invoked via the standard subroutine call instruction. Interrupt gates and 
trap gates are invoked by the INT instruction or by hardware interrupts. Task gates 
are invoked by JMP, CALL, or INT instructions or by hardware interrupts. 
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@ Gate 
[] Code (programs) 
——» Legal access 


ne > = Illegal access 


Figure 5-3. Call paths through gates. 


In a standard subroutine call, the return address and any parameters are stored on 
the stack, and execution continues at the start of the subroutine. When invoking a 
subroutine through a gate, the privilege level of the executing routine changes to 
the level of the code segment to which the gate points. When the subroutine 
returns, the privilege level is set back to that of the calling procedure. For example, 
an application executing in ring 3 might call the operating system to allocate some 
memory. The operating system code runs in ring 0, and a call gate in ring 3 points 
to the allocation routine. 


This approach solves the communication problem but introduces another one. Be- 
cause the return address (and possibly some system call parameters) is on the stack 
and the stack is a ring 3 (application) data segment, the address and parameters are 
no longer secure. The application could corrupt them while the operating system is 
processing the request. To solve this problem, part of the stack is copied to a more 
privileged stack segment as it moves through the gate, as shown in Figure 5-4 on the 
following page. Each call gate descriptor contains a field called the dword count, 
_ which indicates the number of 32-bit stack words to copy from the outer-ring stack 
to the inner-ring stack. 


Every application must have as many stack segments as there are privilege levels in 
the operating environment under which it is running. If this seems excessive, re- 
member that you can use the virtual memory capability to your advantage. An appli- 
cation can have descriptors for more than one stack segment, but stack segments | 
can be marked as not present and never take up any physical memory if they are 

not used. 
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Ring 0 stack after 
Application stack ‘call through gate with 
(Ring 3) dword count of 3. 


Figure 5-4. Stack privilege increase. 


If the idea of four stack segments has you flipping back to the register diagram look- 
ing for additional registers, you won't find them. The active stack pointer is held in 
the SS and ESP registers. The others are stored in a system object called the task 
state segment, or TSS. 


Task state segments 

A TSS is a special memory segment that the processor uses to support multitasking. 
Its format is outlined in Figure 5-5, and it contains a copy of all the registers that 
must be saved to preserve the state of a task. It also contains values that are associ- 
ated with the task but that are not stored in CPU registers. 


The TSS contains three additional stack segment selectors (SSO, SS1, and SS2) and 
three stack pointers (ESPO, ESP1, and ESP2), as shown in Figure 5-5. When a call or 
interrupt through a gate causes a change in privilege, the new SS and ESP are loaded 
from the TSS. The task register (TR) contains the selector of the currently active TSS. 


When a task switch occurs, all the executing task’s registers are saved in the active 
TSS. The task register is then loaded with the selector of a new TSS, and each gen- 

_ eral register is loaded with the values from the new TSS. Other fields in the TSS and 
multitasking are discussed later in this chapter. 


Descriptor tables 

As mentioned earlier, the descriptors for the memory segments, TSSs, gates, and 
other system objects are grouped into descriptor tables. The three types of descrip- 
tor tables are the interrupt descriptor table (DT), the global descriptor table (GDT), 
and the local descriptor tables (LDTs). 


The IDT contains descriptors that relate to hardware and software interrupts. A spe- 
cial register, IDTR, contains the linear base address and size Cimit) of the IDT. The 
IDT is discussed in detail later in this chapter in the section “Interrupts and 
Exceptions.” 
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Address 
31 16 15 0 offset 
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(System dependent) 


TSS | | 
limit 


Figure 5-5. Task state segment (TSS). 


The GDT is the primary descriptor table. The GDT register (GDTR) contains the 
linear base address and limit of the GDT. Important descriptors that the operating 
system uses reside in the GDT. An operating system can be built using only the 
GDT and the IDT. The LDTs, however, provide an additional layer of protection and 
are helpful in building reliable systems. 


The following illustration shows the mechanism used to identifya descriptor given 
a 16-bit selector. The selector is composed of three fields: the index, the table in- 
dicator (TD), and the requested privilege level (RPL). 
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15 22.4.6 
R 
I} 


The RPL can be used to request access to an object at a Jess privileged level than is 
normally granted. If you’re a canny operating system designer, you don’t necessarily 
want access at the most privileged level available to you. Using the RPL in this man- 
ner guards against misuse of highly privileged routines that can subvert the system. 


Consider a programmer who tries to snoop in a “secure” system. This programmer 
knows that an application program that attempts to access the operating system’s 
code will fail. Therefore, the programmer tries another tactic. The snooping applica- 
tion calls the operating system’s disk write routine and passes it a pointer to the sys- 
tem segment to which it wants access. The operating system routine has enough 
privilege to gain access to the segment, so no protection violation occurs, and the 
clever programmer has a disk file that contains the desired segment. Figure 5-6 il- 
lustrates this scenario. 


A secure operating system can foil attempts such as this by ensuring that the 
RPL field of any selector is set to the CPL of the calling routine. The ARPL (adjust » 


——*> Legal access 


----P  ilegal access 


Segment 


j) Call gate 


Level 0 
selector 


segment 


Application 


Application passes the ring 0 selector (which is illegal for it to use) to the ring 0 routine. 
The ring 0 routine gains access to the ring segment and writes it to disk. 


Figure 5-6. Access to an operating system segment. 
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requested privilege level) instruction performs this function. When this is done, the 
system can detect that the requested privilege level (RPL) of the selector is less than 
(numerically higher than) the DPL of the desired segment and can refuse to com- 
plete the operation. Figure 5-7 shows the behavior of a secure operating system in 
this situation. 


The TI bit of a selector identifies the table from which the descriptor is selected. 
When TI is set to 0, the selector refers to the index” descriptor in the GDT. A selec- 
tor value of 0033H, for example, points to the GDT descriptor number 6. The first 
slot in the global descriptor table, GDT(0), is never used. A selector value of 0 is used 
as a null selector.* The null selector can be loaded into a data segment register 
without triggering a protection fault. 


When TI is set to 1, the index refets to a descriptor in the current LDT. LDT() can 
be used to hold a valid descriptor. LDTs are usually created on a per task basis and 
serve two purposes. First, because a selector is 16 bits and the index field is only 13 
bits, you can address a maximum of 8192 descriptors. Multiple LDTs allow you to 
store more descriptors. If there were only one LDT as there is only one GDT, an 
operating system might run out of space to store descriptors. 


—»> Legal access 


----» Illegal access 


A Segment 


C) Call gate 


Level 0 
selector 


ace 


ARPL adjusts selector 
to same privilege as 
application. 


Figure 5-7. Secure operating system using ARPL. 


* The RPL portion of the null selector is ignored, so any of the values 0, 1, 2, or 3 are valid null 
selectors. 
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Second, the LDT also gives you increased security. Figure 5-8 represents an operat- 
ing system that uses only the GDT to store descriptors. The descriptors below 100 
point to various operating system objects and are all ring 0 objects. GDT(100) is a 
ring 3 descriptor for the code segment of application A, and GDT(101) is the data 
segment descriptor, also in ring 3. Descriptors 102 and 103 are the descriptors for the 
code and the data of application B. | 


Any attempt by application A to access outside its code and data segments results in 
a protection violation. However, what if application A attempts to forge a selector? 
That is, what if the application tries to create an otherwise valid selector for a seg- 
ment that doesn’t belong to it? Creating a selector for any of the first 100 GDT slots 
results in a protection violation because the operating system descriptors are ring 0 
objects. If application A creates a selector for GDT(103), however, it can potentially 
access (or destroy) data for application B. The 80386 family prevents access be- 
tween rings but not inside the same ring. 


Figure 5-9 shows the solution to the problem. If each application is given its own 
LDT, the GDT can be reserved for system use. All descriptors in the GDT point to 
objects in rings 0, 1, or 2. The LDT for each task contains the ring 3 (application) 
code and data segments. Each application has a separate LDT, so a forged selector 
can refer to objects only in the GDT, which are more privileged and therefore inac- 
cessible, or to objects in its own LDT. Thus, the LDT defines a virtual address space 
for the application, and each task has a separate, nonoverlapping address space. 


Address space A 
and 
Address space B 


Code 


Figure 5-8. Operating system using only the GDT. 
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Figure 5-9. Operating system using a GDT and LDTs. 


As Figure 5-9 indicates, an LDT is also a system object with its own descriptor. The 
next section illustrates the general format of descriptors. 


Descriptor Formats 


Figure 5-10 on the following page illustrates the formats for three types of descrip- 
tors. The following are the descriptor types: program memory segments, system seg- 
ments, and gates. Program memory segment descriptors were introduced in Chapter 
3. System segment descriptors describe LDTs and TSSs. Like program memory seg-_ 
ment descriptors, system segment descriptors describe regions of memory and have 
a base and a limit. However, you cannot load a descriptor for an LDT or a TSS into a 
segment register and read or write the contents as data. For an operating system to 
update an LDT or a TSS, it must create a memory segment descriptor with the same 
base address and limit, called an alias. Programs such as debuggers, which let you 
modify your program’s code segments, must also create aliases because code seg- 
ments are not writable under the 80386-family protection rules. 
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48 47 32 31 16 15 
Pete, a 
48 47 32 31 16 15 
tn 
24>. 19 
48 47 oie 1 16 15 


Offset, Dword Offset, 


Figure 5-10. General descriptor format: system, memory, and gate descriptors. 


System segments are differentiated from memory segments by a value of 0 in the S 
bit of the descriptor. The TYPE field of a system descriptor can hold any of the fol- 
lowing values: 


0— Unused (invalid descriptor) 
1— 80286 TSS 

2—LDT 

3—Busy 80286 TSS. 
9—80386/80486 TSS 

11— Busy 80386/80486 TSS 


A gate descriptor does not delineate a memory region and therefore has no base ad- 
dress or limit fields. Instead, a gate points to another descriptor via a selector. Call, 
interrupt, and trap gates must contain the selector for a code segment and an offset 
into the segment. Task gates hold a selector for a TSS, and the offset portion of the 
descriptor is unused. 


Gate descriptors, like system segment descriptors, have the S bit set to 0 and can 
contain one of the following values in the TYPE field: 


4—80286 call gate 

5 — Task gate 

6—80286 interrupt gate 

7— 80286 trap gate 

12 —80386/80486 call gate 
14— 80386/80486 interrupt gate 
15 — 80386/80486 trap gate 
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TYPE field values of 8, 10, and 13 are reserved for future Intel processors. 


Descriptor types 1, 3, 4, 6, and 7 are used on the 80286. Operating systems designed 
for the 80286 (such as OS/2 V1.x) run without modification on the 80386, so these 
descriptor types are fully supported. A native mode system (such as OS/2 V2.x), 
however, or one that supports both 16-bit and 32-bit programs, uses full 32-bit de- 
scriptors. You can use 16-bit code and data descriptors in a 32-bit system, but using 
16-bit system descriptors (such as task state segments) can lead to difficulties. 


Multitasking 


I have previously shown how the processor uses call gates to implement interlevel 
subroutine calls. Interrupt and trap gates are discussed later in this chapter. The fol- 
lowing sections show how the remaining system objects (TSSs, LDTs, and task 
gates) are used to implement robust multitasking operating systems. 


Simply defined, a task is ‘a sequence of related actions leading to the accomplish- 
ment of some goal.” In a computer, the resources required to accomplish the goal 
are usually included in the definition of a task—that is, the amount of memory, CPU 
time, disk space, and so on. 


The term multitasking refers to the ability of a computer to execute more than one 
task simultaneously. The basic execution unit cannot execute more than one in- 
struction stream at once, but it can execute one instruction stream, switch to an- 
other, execute it, switch to a third, execute it, switch back to the original, and so on. 
Because the CPU executes so rapidly, all tasks appear to execute simultaneously. 
Concurrency and multiprogramming are synonyms for multitasking. 


An executing task is called a process. Thus, some people refer to multitasking as 
multiprocessing. Others, however, use the word multiprocessing to refer to systems 
in which multiple CPUs or processors are running simultaneously. To avoid confu- 
sion, I do not use the term multiprocessing, and I refer to computers with more than 
one CPU as multiprocessor systems. 


Assume that each task in a computer is implemented by a single program; therefore, 
multiple programs must share the CPU. Various strategies exist for sharing the CPU, | 
but to discuss and compare these strategies is beyond the scope of this book. At 
some level, each system must turn over control of the CPU from one task to another. 


The first task might be in the middle of a computation when control is wrested from 
it and passed to another task; when the first task resumes, it must be able to con- 
tinue processing as though nothing had happened. All the registers that the task was 
using must be restored to their original values when that task regains control. 


The 80386/80486 hardware supports this kind of task switching via the TSS. Figure 
5-11 on the following page depicts the memory layout of the TSS. Each TSS has only 
one descriptor, which defines its base memory address and limit. Figure 5-11 shows 
the TSS descriptor format immediately above the TSS. To allow access to the TSS by 
different privilege levels or via interrupts, you must use task gates. 
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Figure 5-11. Task state segment and descriptor. 


The TSS descriptor is similar to that of a typical memory segment; however, the S bit 
is 0, indicating that the TSS is a system segment. The TYPE field for a TSS contains 
either a binary 1001B or 1011B (decimal 9 or 11). The variable bit is called the busy 
bit. This bit is set to 1 in the currently executing task and in any tasks that have 
called the current task, establishing a chain of nested tasks. Any attempt to invoke a 
task that is marked as busy triggers an exception. 
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The selector in the task register (TR) identifies the current task. Usually, this register 
is loaded once at initialization time and then is managed by the task switch opera- 
tion. Loading TR does not cause a task switch; it does identify the active TSS, 
however. 


When a task switch occurs, the state of the currently executing task is saved in its 
TSS, and the CPU registers are loaded from the image of the new or destination TSS. 
The task register contains a selector for the currently active TSS. TSS descriptors can 
be located only in the GDT. 


Part of the TSS in Figure 5-11 is gray. The gray portion indicates values that are not 
stored in the outgoing TSS during a task switch, although new values are loaded 
from the destination TSS. If any gray value changes during execution of the task, the 
operating system must ensure that the TSS is kept current. The application cannot 
change these values; they require kernel support (privilege level 0) to be modified. 


The bulk of the TSS holds copies of the general register set: EAX—EDI, the segment 
registers, EFLAGS, and EIP. In addition, the TSS contains these fields: 


Back link—The selector of the TSS that was previously executing. 


SSn, ESPn —The stack pointers for ring m execution, as discussed in the section on 
call gates. 


CR3— Control register 3, which defines the physical memory address of the page 
tables for the task. 


LDTR—The selector of the LDT for the task. 


T— The “trap on task switch” bit. A debug fault Gnterrupt 1) occurs when this bit is 
set to 1 in the incoming TSS. 


I/OP bitmap base— A 16-bit offset into the TSS that indicates the start of the I/O 
permission bitmap. If this field is set to 0, no I/O permission bitmap exists. 


System dependent— The portion of the TSS that the operating system can use to 
store any operating system-specific information about the task. 


I/O permission bitmap—The field that starts at the offset indicated by the I/OP 
bitmap base and continues to the end of the TSS or to the base plus 8192. 


Task switching 


Four events can cause a task switch: 


m The current task executes a FAR CALL or JMP instruction in which the selector 
points to a TSS descriptor. 


@ The current task executes a FAR CALL or JMP instruction, and the selector points 
to a task gate. 
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The current task executes an IRET instruction to return to the previous task. An 
IRET causes a task switch only if the NT (nested task) bit of the EFLAGS register 
is set to 1. 


An interrupt or exception occurs, and the IDT entry for the vector is a task gate. 


For any task switch, the following events take place: 


1. If the task switch is not caused by a hardware interrupt, an exception, or an 


IRET, the descriptor privilege rules are checked. The DPL of the descriptor (TSS 
or task gate) must be numerically less than the current task’s CPL and the selec- 
tor’s RPL. 


. The present bit and limit of the descriptor for the current (outgoing) TSS is 


checked to ensure that the TSS is present and can hold at least 104 bytes of state 
information. If so, the current machine state is saved; otherwise, an exception 
occurs. 


. The present bit and limit of the descriptor for the new Gncoming) TSS is 


checked. If the TSS is not present or is too small, an exception occurs; other- 
wise, all the register images are loaded. If the value of CR3 has changed, the 
TLB cache (see Chapter 7) is flushed. 

At this point, all the general and segment registers are loaded, but the shadow 
registers are not. CS might have a value of 217FH, but the descriptor for selector 
217FH has not been loaded. The state of the outgoing task has been saved, how- 
ever, and any exceptions that occur are in the context of the new state, even if 
the CS descriptor is not present or is invalid. 


4. The linkage to the outgoing task is established. What happens next depends on 


what caused the task switch. 


a. Ifthe task switch was caused by a JMP instruction, the TSS descriptor of the 
outgoing task is marked as not busy, and the incoming task descriptor is 
identified as a busy TSS. 


b. Ifthe task switch was caused by an interrupt or a CALL instruction, the 
outgoing task remains busy, and the incoming task is also marked as a busy 
TSS. Additionally, the NT bit of the EFLAGS register is set to 1, and the back 
link field of the incoming TSS is set to the selector of the outgoing TSS. 


c. Ifthe task switch was caused by an IRET instruction, the outgoing task is 
set to not busy. | 


. The task switched (TS) bit in CRO is set to 1, and the current privilege level for 


the incoming task is taken from the RPL field of the CS selector in the TSS. 


. The LDTR shadow registers are loaded if the LDTR contains a valid selector. If 


the LDTR value is 0 (the null selector), no action is taken. If the selector is in- 
valid or if the new LDT is not present, an exception occurs. 
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7. The descriptors for CS, SS, DS, ES, FS, and GS are loaded into the shadow regis- 
ters in that order. All descriptors are tested for privilege violations (CPL has 
already been established) and must be marked present; otherwise an exception 
occurs. 


8. The local enable bits in DR7 are cleared to 0. 
9. If the T bit of the incoming TSS is set to 1, a debug fault (interrupt 1) occurs. 


10. The new task begins executing by fetching the instruction at CS:EIP. 


I/O permission bitmap 


Two conditions determine whether a task is allowed to perform I/O: the I/O privi- 
lege level and the I/O permission bitmap. The IOPL bits in the EFLAGS register de- 
termine the I/O privilege level. The IOPL defines the least privileged level that can 
perform an I/O instruction without restriction. For example, if IOPL = 2, I/O instruc- 
tions can be performed by procedures executing at levels 0, 1, or 2. An attempt to 
execute an instruction by a ring 3 application must be further validated by the I/O 
permission bitmap. 


If the CPL of the current task is greater than IOPL (that is, if I/O is restricted for that 
task), the I/O permission bitmap is checked. This protects the I/O address space on 
an individual I/O port basis. The TSS stores an I/O permission bitmap for every 

- task. The bitmap begins at the offset in the TSS specified by the 16-bit I/O map base 
value. The I/O map base value must be greater than or equal to 68H. 


The I/O permission bitmap is a maximum of 8192 bytes, with one bit for each of the 
possible 65,536 I/O ports. If the bit in the bitmap corresponding to the I/O port is 
set to 1, then the task does not have access to the port, and a general protection fault 
will occur if the task attempts to execute an I/O instruction at that port. 


The I/O permission bitmap is not required to be 8192 bytes. The limit field of the. 
TSS descriptor specifies the end of the bitmap. If the I/O map base value is greater 
than or equal to the limit value, the TSS contains no I/O permission bitmap. All 
ports that do not have a bitmap position in the TSS are protected from access. 


Figure 5-12 on the following page shows a sample bitmap. The task with this TSS 
can access ports 8, 9, 10, 11, and 12. A subroutine in this task can access byte ports 8, 
9, 10, 11, and 12, word ports 8 and 10, or dword port 8. 
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Figure 5-12. I/O permission bitmap in TSS. 


Interrupts and Exceptions 


Interrupt is a term that refers to a variety of similar control transfers. The specific 
items implied by this term are true interrupts (hardware interrupts) and exceptions, 
which are further subdivided into traps, faults, and aborts. 


All interrupts and exceptions share a common feature: The current execution loca- 
tion (CS:EIP) and flags register (EFLAGS) are saved on the stack, and control trans- 
fers to a software routine called an interrupt handler via a gate in the interrupt 
descriptor table (DT). The processor supports a maximum of 256 descriptors in the 
IDT. Every interrupt or exception is associated with one of these interrupt numbers. 
Interrupt numbers 0 through 31 are reserved for specific purposes assigned by Intel; 
the operating system can assign numbers 32 through 255. 


The kinds of interrupts and exceptions are: 


Interrupts—True interrupts are caused by hardware signals that originate outside 
the CPU. Two pins on the 80386 or 80486, NMI and INTR, signal interrupts. Pulling 
the NMI pin low activates a nonmaskable interrupt. The NMI interrupt always in- 
vokes the routine associated with interrupt vector (IDT entry) 2. 


An active signal on the INTR line causes a maskable interrupt. The CPU does not re- 
spond to a maskable interrupt unless the IF bit of the EFLAGS register is set to 1. 
When the IF bit is 0, interrupts are not recognized and are said to be masked. If the 
processor responds, it issues an interrupt-acknowledge bus cycle, and the interrupt- 
ing device must respond with an interrupt number. Use only values 32-255 for 
maskable interrupts. | 


Traps— These are conditions that the processor regards as errors and detects after 
the execution of a software instruction. The saved instruction pointer (CS:EIP) on 
the stack points to the instruction immediately after an instruction that has trapped. 


A classic example of a trap is the INTO instruction. When INTO is executed, the 
processor checks the value of the overflow flag (OF). If OF = 1, the CPU vectors 
through IDT descriptor 4. 
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All software interrupt (INT) instructions are handled as traps. To issue one of these 
instructions, however, a procedure must have access privilege to the IDT descriptor 
for the interrupt number. For example, if a ring 3 application executes an INT 47 in- 
struction, the descriptor at IDT(47) must have DPL = 3; otherwise, a protection fault 
occurs. This mechanism prevents applications from issuing INT instructions for 
vectors associated with hardware interrupts because the gates for these vectors 
point to operating system code that runs at high privilege levels, usually ring 0. 


Faults — When the execution unit detects an error during the processing of an in- 
struction (for example, when the instruction’s operand is stored in a page frame 
marked not present), a fault occurs. A specific interrupt number is associated with 
each fault condition. The instruction pointer saved on the stack after a fault occurs 
points to the instruction that caused the fault. Thus, the operating system can cor- 
rect the condition and resume executing the instruction. 


Aborts— When an error is so severe that some context is lost, the result is an abort. 
It might be impossible to determine the cause of an abort, or it might be that the in- 
struction causing the abort is not able to be restarted. 


The following table lists all of the exceptions handled by the processor: 
80386/80486 Exceptions 


Interrupt Number Class Description 
0 Fault Divide error 
1 Fault or trap Debugger interrupt 
Z Interrupt Nonmaskable interrupt 
3 Trap Breakpoint 
4 Trap Interrupt on overflow (INTO) 
5 Fault Array boundary violation C(RBOUND) 
6 Fault Invalid opcode 
7 Fault Coprocessor not available 
8 Abort Double fault 
9 Abort Coprocessor segment overrun (reserved, 
on 80486) 
10 Fault Invalid TSS 
11 Fault Segment not present 
12 Fault Stack exception 
13 Fault General protection violation 
14 Fault Page fault 
15 Reserved 
16 Fault Coprocessor error 
17 Fault Alignment check (80486 only) 
18-31 Reserved | 
32-255 Interrupt or trap System dependent 
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One class of error is more severe than an abort. If the processor is unable to con- 
tinue processing an exception, it shuts down. In a protected-mode environment, the 
system should shut down only if a hardware failure occurs. To prevent shutdown, 
the vectors that handle the double fault (interrupt 8) and invalid TSS (interrupt 10) 
conditions should be separate tasks, and IDT entries 8 and 10 should be task gates. 
This approach allows the CPU to load a new machine state from which to handle 
the exceptions. If this is not done, the exception handler might be running in the 
same environment that caused the failures and might not be able to continue 
processing. 


interrupt gates, trap gates, and task gates 


The only types of descriptors that can reside in the IDT are interrupt gates, trap 
gates, and task gates. Task gates in the IDT are identical to those in the GDT and op- 
erate in the same manner. 


When a task gate is invoked with an interrupt or with an exception, the machine 
state is saved in the existing TSS, and a new state is loaded from the TSS associated 
with the task gate. Thus, an interrupt can have its own address space, including its 
own page tables and LDT. In addition, the interrupt handler is prevented from using 
too much of the interrupted application’s stack and from corrupting any registers. A 
task switch takes longer to execute than a gate transfer, however, and the advantages 
of invoking a task gate must be weighed against performance considerations. 


The most common entries in the IDT are interrupt gates and trap gates. These de- 
scriptors have identical formats—only the type code is different. Figure 5-13 illus- 
trates the descriptor format for interrupt gates. The only difference in behavior 
between the two gates is that when an interrupt gate is activated, the IF bit of the 
EFLAGS register is cleared to 0. Hardware interrupts are masked until the interrupt 
handler deems it safe to reenable them. Transferring control through a trap gate 
does not modify the interrupt flag. 


48 47 32 31 16 15 


Figure 5-13. Interrupt gate and trap gate descriptor format. 


The behavior of interrupt gates and trap gates is similar to that of call gates. 
Although interrupt gates and trap gates do not contain a word count field, they can 
point to code segments of specific privilege levels or to conforming segments. 
Figure 5-14 shows the layout of the stack when an interrupt handler is invoked. 


An interrupt handler must return to the calling routine via an IRET instruction. The 
IRET restores the original instruction pointer, flags, and stack segment. If the NT 
(nested task) bit was set in the EFLAGS register, a task switch to the original TSS 
also occurs. The programmer should remove any error code (generated by the fault) 
from the stack before returning from the interrupt handler. 
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Figure 5-14. Interrupt stack without and with privilege transition. 


80386-family processor exceptions 


The following sections explain the faults, traps, and aborts that can occur during 
program execution. Some exceptions cause a control transfer via the IDT; others 
cause an error code to be pushed onto the stack as well. If an error code is pushed, 
it is pushed onto the stack of the interrupt handler; that is, it is pushed after any priv- 
ilege level or task transition. Exceptions that cause error codes to be pushed onto 
the stack are indicated in the following sections with the symbol /ec/. The value of 
the error code is either 0 or as defined in the following illustration: 


31 16 15 21 0 


The selector index and TI fields are taken from the selector of the segment associ- 
ated with the exception. Instead of an RPL field, however, the error code has an I bit 
and an EX bit. The I bit is set to 1 when the index refers to an IDT index, and the TI 
bit is ignored. When I = 0, the TI bit indicates whether the selector is from the GDT 
(TI = 0) or from the current LDT CTI = 1). If the EX bit is set to 1, the fault was caused 
by an event outside the executing program. 
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interrupt 0O— Divide (fault) 

A divide fault occurs if division by zero is attempted or if the result of a divide opera- 
tion does not fit into the destination operand. (This applies only to division by DIV 
or DIY, not to floating-point division.) 


Interrupt 1— Debugger (fault or trap) 
This exception is triggered by one of the following conditions: 


Debug register breakpoint 
Single step trap 
Task switch trap 


The “Debugging” section later in this chapter covers the triggering and handling of 
debug traps in detail. 


Interrupt 2— NMI (interrupt) 
IDT vector 2 is reserved for the hardware NMI condition. No exceptions trap 
through vector 2. 


Interrupt 3—Breakpoint (trap) 
Debuggers use the breakpoint interrupt (INT 3), which is covered in the “Debug- 
ging” section later in this chapter. 


Interrupt 4— Overflow (trap) 

The overflow trap occurs after an INTO instruction has executed if the OF bit is set 
to 1. The INTO instruction is useful in languages such as Ada that require arithmetic 
instructions either to produce a valid result or to raise an exception. 


Interrupt 5— Bounds check (fault) 

Like interrupt 4, the bounds check trap occurs as the result of a software instruc- 
tion. The BOUND instruction compares an array index with an upper bound and a 
lower bound. If the index is out of range, the processor traps to vector 5. 


Interrupt 6—Invalid opcode (fault) 
An interrupt 6 fault occurs if: 


m The processor tries to decode a bit pattern that does not correspond to any legal 
machine instruction. : 


m The processor tries to execute an instruction that contains invalid operands. 


™ The processor tries to execute a protected-mode instruction while running in 
real mode or in virtual 8086 mode. 


m@ The processor tries to execute a LOCK prefix with an instruction that cannot be 
locked. 
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Opcodes that are illegal on the 8086 or cause an invalid opcode fault on the 80286 
do not always cause an exception when the 80386/80486 executes in real mode. 
The opcodes might correspond to new instructions that are valid in any 
80386/80486 operating mode. 


Interrupt 7—Coprocessor not available (fault) 

When a computer does not contain an 80287 or 80387 coprocessor, the operating 
system can set the EM bit of register CRO to indicate NDP software emulation. If the 
EM bit of register CRO is set, an interrupt 7 fault occurs each time a floating-point in- 
struction is encountered. - 


This fault also occurs if the MP bit of CRO is set and the 80386 executes a WAIT or 
floating-point instruction after a task switch. The task switch sets the TS bit to 1. 
The operating system can clear TS after a task switch to prevent the fault from oc- 
curring. The 80386 uses this method to signal that the state of the math coprocessor 
needs to be saved so that it can be used by another task. 


interrupt 8— Double fault (abort) [ec] 

Processing an exception sometimes triggers a second exception. For example, sup- 
pose that a divide fault occurs during the processing of an application and that the 
trap gate for interrupt 0 points to a conforming segment so that the privilege level 
does not change. Now suppose that the user stack does not have room for the CS, 
EIP, and EFLAGS pushed by the divide fault. The condition of being unable to pro- 
cess the divide exception correctly would result in a double fault. 


Not all exception pairs result in double faults. In some cases, most notably when 
getting access to the fault handler causes a page fault, the second fault is processed 
first, and then control transfers to the initial exception handler. The following table 
shows the exception pairs that trigger a double fault: 


Initial Exception Double Fault If Followed By 
0 (Divide fault) 0, 9°, 10, 11, 12, 13 
9* (NDP segment overrun) 0, 9°, 10, 11, 12, 13 

10 Cinvalid TSS) 0, 9°, 10, 11, 12, 13 

11 (Not present) 0, 9", 10, 11, 12, 13 

12 (Stack fault) 0, 9°, 10, 11, 12, 13 

13 (General protection) 0, 9", 10, 11, 12, 13 

14 (Page fault) © 0, 9°, 10, 11, 12, 13, 14 


“Does not apply to the 80486. 


A task gate can best handle the double fault vector, although a secure ring 0 segment 
usually works. You should use the method best suited for placing the system in a 
known state, because the processor shuts down if a third fault occurs while the pro- 
cessor is trying to start the interrupt 8 exception handler. 
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The shutdown state is similar to the halt state. Only a processor reset or NMI Gif the 
NMI vector is valid) can bring the processor out of shutdown. A special shutdown 
signal is placed on the bus so that external hardware can detect the shutdown. An 
error code of 0 is pushed onto the stack when a double fault exception occurs. 


Interrupt 9—Coprocessor segment overrun (abort) 

The coprocessor segment overrun exception is signaled when a floating-point in- 
struction causes a memory access that runs beyond the end of a segment. If the 
starting address of a floating-point operand is outside the segment limit, a general 
protection fault (interrupt 13) occurs rather than an interrupt 9. 


The segment overrun exception is classified as an abort because the instruction 
cannot be restarted. You must use the FNINIT instruction to reinitialize the 80387 
_ coprocessor. The CS:EIP saved on the stack will point to the offending instruction. 
(Note: This interrupt is not generated by 80486 systems.) 


Interrupt 10— Invalid task state segment (fault) [ec] 

Because the TSS contains a number of descriptors, a variety of causes can trigger an 
interrupt 10. The processor pushes an error code onto the stack to aid in diagnosing 
the error condition. The following table lists invalid TSS fault conditions and the 
value of the error code pushed onto the stack for each condition. The items are 
listed in the order in which they are checked by the CPU. 


Condition Error Code Value 

Outgoing TSS limit < 103 TSS index : TI : EXT 

Incoming TSS limit < 103 TSS index : TI : EXT 

LDT selector has TI = 1 LDT index : TI : EXT 

LDT descriptor has S = 1 LDT index : TI : EXT 
_ LDT descriptor TYPE != 2 LDT index : TI: EXT 

LDT descriptor not present LDT index : TI : EXT 

CS selector is null CS index 

CS descriptor has S = 0 CS index 

CS descriptor not executable CS index 

CS conforming, DPL > CPL CS index 

CS not conforming, DPL != CPL or DPL < RPL CS index 

SS selector is null , SS index 

SS selector RPL != CPL SS index 

SS descriptor has S = 0 SS index 

SS descriptor not writable SS index 


The following checks are made for all other selectors 
in the order DS, ES, FS, and GS: 


Descriptor has S = 0 DS, ES, FS, or GS index 
Descriptor is execute only DS, ES, FS, or GS index 
Descriptor not conforming, DPL < CPL or DPL < RPL DS, ES, FS, or GS index 
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The CPL value is taken from the RPL of the incoming CS selector. If one of the 
memory segment descriptors is marked not present, a not present fault or stack fault 
occurs rather than the invalid TSS fault. The TSS load stops at the point of the fault, 
and the other exception handler must ensure that the remaining segment registers 
get loaded. 


Interrupt 11—Not present (fault) [ec] 

The not present interrupt lets you implement virtual memory via the segmentation 
mechanism. An operating system can mark a memory segment as not present and 
swap its contents out to disk. The interrupt 11 fault is triggered when an application 
needs to access the segment. 


This fault occurs when the processor tries to gain access to a descriptor that is not 
present (P = 0). Loading DS, ES, FS, or GS triggers the fault, as does a FAR CALL or 
JMP that either loads CS with a segment marked not present or accesses a gate 
whose descriptor is marked not present. In addition, the LLDT and LTR instructions 
cause descriptors to be loaded and can trigger the fault. 


A segment fault that occurs when loading the SS register results in a stack fault Gin- 
terrupt 12) rather than in a not present fault. Additionally, when the LDTR is loaded 
during a task switch rather than by the LLDT instruction, an invalid TSS exception 

occurs if the descriptor has P = 0. 


The CS and EIP that are pushed onto the stack as a result of the exception usually 
point to the offending instruction. Also pushed is an error code that identifies the 
selector involved in the fault. The only time that CS:EIP does not point to the of- 
fending instruction is when a task switch occurs and a selector in the new task im- 
age causes the not present exception. 


In this case, the CS:EIP points to the first instruction of the new task. The selectors 
are loaded in the order SS, DS, ES, FS, and GS, and the task switch terminates at the 
point of the fault. The interrupt 11 fault handler must handle the fault and validate 
the remaining selectors. If the interrupt 11 fault handler is invoked via a task gate, 
this happens on the IRET that ends interrupt 11. If a trap gate invokes the interrupt, 
however, the fault handler must test each selector with the LAR instruction. 


Interrupt 12— Stack (fault) [ec] 

A task gate should handle this exception because the state of the stack is unknown 
when a stack fault occurs. You can use a level 0 trap gate, but if a stack fault occurs 
at ring O, the trap to the interrupt 12 handler results in an immediate double fault. 


A stack fault with an error code of 0 occurs if a normal instruction refers to memory 
- beyond the limits of the stack segment. This includes instructions such as PUSH 
and POP, and instructions that use an SS: segment override or use EBP as a base 
register. In addition, the ENTER instruction causes the same fault if it causes ESP to 
be decremented beyond the lower bound of the segment. Instructions such as SUB 
ESP, 10 do not cause stack faults. 
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If the stack fault is triggered by loading SS with a not present selector or if the fault 
occurs during gated transition between privilege rings, an error code indicating the 
offending selector is pushed onto the stack. Loading SS with invalid descriptors (out 
of range, segment not writable, and so on) results in a general protection fault rather 
than a stack fault. 


When the error code is 0, this usually means that a given stack segment is too small. 
If the operating system supports expand-down segments, it can expand the stack of 
the faulting application. The saved CS:EIP points to the faulting instruction, which 
can always be restarted; however, the same caveat that applies to task switches and 
not present exceptions also applies to stack faults. See the final paragraph of “Inter- 
rupt 11—Not present (fault)[ec]” for more details. 


Interrupt 13— General protection (fault) [ec] 

Any condition not covered by some other exception triggers a general protection 
fault. This fault usually indicates that the program has been corrupted and should be 
“terminated with prejudice,” as the old UNIX phrase goes. 


The exception to this rule is that V86 mode tasks trigger general protection faults 
when the system needs to be “virtualized.” For example, a V86 task that tries to dis- 
able interrupts or issue a software interrupt instruction triggers a general protection 
fault when IOPL < 3. In such a case, the interrupt handler must determine the 
proper behavior and return control to the faulting task. 


The operating system can restart any instruction that triggers a general protection 
fault, although doing so is often inappropriate. An error code is always pushed onto 
the stack as part of the exception; in many cases, however, the value is 0. When the 
value is not 0, the value indicates the selector that caused the exception. 


Interrupt 14— Page (fault) [ec] 

The page fault interrupt lets you implement virtual memory on a demand-paged 
basis. An interrupt 14 occurs whenever an access to a page directory entry or page 
table entry refers to an entry with the present bit set to 0. The operating system 
makes the page present, updates the table entry, and restarts the faulting instruc- 
tion. A page fault also occurs when a paging protection rule is violated. In this case, 
the operating system needs to take other appropriate action. 


When a page fault occurs, the CR2 register is loaded with the linear address that 
caused the fault, and an error code is pushed onto the stack. The page fault error 
code is different from that of the other exceptions and has this format: 

31 321 O 


| 
| | S|R 


The three low-order bits of the error code provide more information about why the 
address in CR2 caused the fault. The P bit is set to 1 if the fault was a page protection 
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fault rather than a page not present fault. The W/R bit is set to 1 if the faulting in- 
struction was attempting to write to memory. The bit is cleared to 0 if the fault oc- 
curred during a read. Finally, the U/S bit is set to 1 if the faulting instruction was 
executing in user mode and is cleared to 0 if the instruction was a supervisor in- 
struction. (User mode and supervisor mode are discussed in Chapter 7.) 


Because of the large number of divergent memory accesses that occur during a task 
switch, operating system designers should ensure that important task tables (the 
GDT, application TSS, and application LDT) are resident in memory before execut- 
ing the task switch. The situations that arise if page faults occur during a task switch 
are not impossible to deal with; system design is simpler if you avoid them. 


Interrupt 15 
This vector is reserved for future Intel processors. 


Interrupt 16—Coprocessor error (fault) 

This interrupt occurs at the start of an ESC (coprocessor) instruction when an un- 
masked floating-point exception has been signaled by a previous instruction. (Be- 
cause the 80386 does not have direct access to the FPU, it checks the ERROR\ pin to 
test this condition.) 


The interrupt is also triggered by a WAIT instruction if the EM bit at CRO is set. 


Either of these conditions will automatically trigger the interrupt in the 80386. In the 
80486, however, you must also set the NE bit in CRO to enable the interrupt. If NE is 
0, the processor will halt until an external hardware interrupt occurs. 


Note: The NE bit is new in the 80486; this requirement does not apply to 80386 
systems. 


Interrupt 17— Alignment check (fault) [ec] 

This interrupt occurs only on the 80486. Interrupt 17 is reserved on the 80386. It 
occurs when code executing at the application level (privilege level 3) attempts to 
access a word operand that is not on an even-address boundary, a doubleword 
operand whose address is not divisible by four, or a long real or temp real whose 
address is not divisible by eight. Alignment checking is disabled when the processor 
is first powered up. It is enabled by setting the AC bit in the EFLAGS register and the 
AM bit in CRO. 


Interrupts 18-31 
These vectors are reserved for future Intel processors. 


Interrupts 32-255 

These vectors are available for use by an operating system. The system can install 
interrupt, trap, or task gates in any IDT slot corresponding to one of these interrupts. 
The interrupt handlers can be invoked by software INT 7 instructions or by hard- 
ware that signals the CPU via the INTR pin. 
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interrupt masking and priority 


The only programming mechanisms for masking interrupts are the CLI/STI instruc- 
tions, which affect the hardware INTR line. However, other situations prevent cer- 
tain types of interrupts, either by design or because a more important interrupt is 
pending. Interrupts have the following priority ranking: 


1. Nondebug faults 

2. Trap instructions (software interrupts INT 0, INT 3, INT 7) 
3. Debug traps for the current instruction | 

4. Debug faults for the pending instruction 

5. Hardware NMI 

6. Hardware INTR interrupt 


For example, if a page fault and a debug fault are triggered on the same instruction, 
the page fault takes priority, and the debug fault is masked. However, when the page 
fault handler completes its operation and restarts the faulting instruction, the debug 
fault is retriggered. 


Other interrupt masking conditions occur when: 


m An NMI is triggered. Further NMIs are masked until the next IRET instruction 
occurs. 


m A debug fault occurs. Debug faults cause the RF bit in the EFLAGS register to be 
set, masking additional debug interrupts. The processor clears RF upon suc- 
cessfully completing an instruction. 


m The SS register is loaded. Hardware interrupts (both NMI and INTR) and debug 
exceptions (including single-step) are masked for the duration of one instruction 
after SS is loaded. Thus, the ESP register can load without risk of invoking an in- 
terrupt handler with an invalid stack pointer. The instruction that loads ESP can, 
however, receive a page fault, and the interrupt 14 routine will be invoked with 
an invalid stack pointer, possibly leading to a double fault. You can avoid this by 
loading both SS and ESP using a single instruction, LSS. 


Debugging 
Traditionally, microprocessors have never contributed much to solving the problem 
of debugging. Debugging on microprocessors has been accomplished with break- 
point instructions and with the ability to single-step (execute one instruction at a 


time); but for difficult problems, programmers have had to turn to in-circuit emula- 
tors or hardware-assisted debuggers. 


As microcomputer systems become more sophisticated, hardware’s ability to deter- 
mine what is going on inside the CPU diminishes. For example, assume that a pro- 
grammer wants to be notified that a particular data structure has been modified. 
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Because of paging, the structure might not be in contiguous memory. The operat- 
ing system’s virtual memory capability allows it to move the program out from 
under the eye of the debugging hardware, and thus the program’s linear and sym- 
bolic addresses bear no relation to the generated hardware addresses. 


Fortunately, the chip designers at Intel recognized these problems and added fea- 
tures to their processors that system software can use to aid in debugging. Four 
mechanisms trigger debug interrupts under different conditions: trap flag, task 
switch trap, breakpoint registers, and software breakpoint. 


Trap flag 

Setting the TF bit in the EFLAGS register causes a single-step fault (interrupt 1) to 
occur before the next instruction. The CPU clears the TF bit before invoking the 
handler pointed to by IDT(), although the saved image of EFLAGS on the stack has 
the trap flag set. 


When a software interrupt instruction (INT, INTO) is executed, the TF bit is 
cleared. A debugger should not attempt to single-step an INT instruction but should 
place a breakpoint either at the destination of the gate pointed to by INT or imme- 
diately after the INT instruction. 


A call gate does not clear the trap flag, so a debugger should check all FAR CALLs 
and JMPs to see whether they cause a change in privilege level. If so, programmers 
should not be allowed to single-step into code more privileged than their 
applications. 


Task switch trap 

When the T bit of a TSS is set to 1, switching to the TSS’s task invokes the debugger 
fault Gnterrupt 1). The fault does not occur until after the contents of the TSS are 
loaded and before the first instruction of the task is executed. 


Breakpoint registers 

The debug registers (DRO—DR7) implement four address breakpoints. When the 
debug address registers are correctly initialized, each identifies a linear address. If 
the processor accesses that address, then a debugger fault (interrupt 1) occurs. The 
debug registers are described in detail in “Programming the debug registers” in 
this chapter. 

Software breakpoint - 

The single-byte INT 3 (OCCH) instruction triggers this interrupt. By replacing the 
first byte of an instruction with an INT 3, a debugger can cause a breakpoint to oc- 
cur when the execution stream reaches the INT 3. Because the software interrupts 
are classified as traps, the saved CS and EIP on the stack point to the byte immedi- 
ately after INT 3. To restart the program, the debugger must replace the OCCH value 
with the first byte of the original instruction, decrement EIP so that it points to the 
start of the instruction, and execute an IRET to return from the interrupt handler. 
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This method of implementing breakpoints is much clumsier than using the debug 
registers because it requires creating a writable alias for a code segment, saving the 
original instruction byte, replacing the instruction with an INT 3, and undoing the 
above when the breakpoint has been triggered. However, because the debug regis- 
ters allow only four active breakpoints at once, a reasonable trade-off is to use 
debug registers for data space breakpoints and INT 3 for code space breakpoints. 


Programming the debug registers 

Figure 5-15 shows the layout of the debug registers. To load a value into one of the 
registers, use a MOV DR«, reg instruction. Similarly, using MOV reg, DRx reads the 
contents of a debug register into one of the 32-bit general registers. 


The first four registers (DRO—DR3) are address registers. The linear address of a 
desired breakpoint must be loaded into one of these registers. The debug registers 
are not affected by paging. Only the linear address (from the descriptors) is used to 
match a breakpoint address. Debug registers DR4 and DR35 are reserved for future 
Intel microprocessors. 


31 0 


16 15 8 7 0 


Figure 5-15. Debug registers. 
Register DRO is the status register. It indicates the conditions that lead to the inter- 


rupt. A bit is set to 1 in DRO if the condition associated with the bit has been met. 
The following table identifies the bits and the reasons for the interrupt. 
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Bit Reason 

BO Breakpoint register 0 triggered 

Bl Breakpoint register 1 triggered 

B2 Breakpoint register 2 triggered 

B3 Breakpoint register 3 triggered 

BD Intel ICE hardware active 

BS Single step (TF set to 1) 

BT Task with switch occurred; new task’s 


TSS T bit set to 1 
Bits BO-B3 are set to 1 if the breakpoint in DRO—DR3 was matched during execu- 
tion, even if the breakpoint was not enabled and did not cause the debug fault. 


When Intel ICE hardware is used, the debug registers are reserved for the in-circuit 
emulator. The BD bit is set to 1, and any attempt to place (MOV) a value into one of 
the debug registers triggers an interrupt 1. 


The debug interrupt handler must clear the contents of register DR6. The CPU sets 
bits, but bits can be cleared only programmatically. 


DR7 is the debug control register. Merely placing an address in DRO—DR3 will not 
enable a breakpoint. The enable bit(s) in DR7 must be set, as must the breakpoint 
length and condition. 


The LEN77 fields let you specify the length of breakpoint 7. The length values are 
encoded as follows: 


00—Byte / breakpoint legal at any address 

01— Word (2 bytes) / breakpoint must be on even address 
10—Reserved for future use 

11— Dword (4 bytes) / breakpoint address must be on dword boundary 


The R/Wn field allows you to specify the type of memory access that triggers break- 
point 7. This field is encoded as follows: 


00— Execution breakpoint 
01—Memory write breakpoint 
10—Reserved for future use 

11— Memory read or write breakpoint 


When R/W is set to OOB, an execution breakpoint, the corresponding LEN field also 
must be set to OOB. An execution breakpoint is triggered only if the breakpoint ad- 
dress is set to the first byte of the instruction. If any prefix bytes are part of the in- 
struction, the breakpoint must be set to the address at the first prefix byte. 
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The Lv and Gn bits allow breakpoints to be locally or globally enabled. If neither 
the L nor the G bit is set, the breakpoint is disabled and does not trigger an inter- 
rupt, although the corresponding bit in DR6 is set if the breakpoint condition is met. 


If only the L bit is set, the breakpoint is locally enabled. A task switch clears the L 
bits. The system should mark the T bit in the TSS of the task using locally enabled 
breakpoints so that an interrupt 1 occurs when the task is reactivated. Then the L 
bits can be reset. 


If the G bit is set, the breakpoint is globally enabled and can be disabled only by 
clearing G to 0. (Setting both the L and G bits equals setting the G bit.) 


Register DR7 contains two other bits, LE and GE. When either bit is set, it enables 
the exact match condition. When exact match is enabled, the processor slows to en- 
sure that the interrupt 1 fault reports the instruction that triggered the breakpoint. If 
LE and GE are 0, the execution unit might get ahead of the debug unit because of 
the internal parallelism in the processor, and the CS and EIP on the interrupt han- 
dler stack might point one or two instructions beyond the one that triggered the 
fault. The performance loss is not significant, and LE and GE should be enabled. The 
difference between the two bits is that LE is cleared after a task switch, as are the 
Ln bits. 


Triggering the debug interrupt 

The following table shows how the address and control fields define a breakpoint 
condition and gives examples of instructions that do or do not trigger the break- 
point. The table assumes a base address of CS = 0003A000H and DS = 0004C000H 
and that GO = 0. 


Debug Register Break- 
Settings Instruction point Reason 


DRO: 0004C020H 


DR7: LO = 1, R/W0 = 00B MOV AL, [20] N Execution breakpoint 

LENO = 00B 

DRO: 0004C020H 

DR7: LO = 1, R/W0O = 11B MOV AL, [20] Y Byte 4C020H read 

LENO = 00B 

DRO: 0004C020H 

DR7: LO = 1, R/W0 = 01B MOV AL, [20] N Breakpoint on write 

LENO = 00B 7 access only 

DRO: 0004C020H 

DR7: LO = 1, R/W0 = 11B MOV AL, [23] x Breakpoint covers 

LENO = 11B 4 bytes 

DRO: 0004C020H 

DR7: LO = 1, R/W0 = 11B INC DWORD PTRIO1E] Y Dword extends into 

LENO = 11B breakpoint area 
(continued) 
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continued 

Debug Register Break- 

Settings Instruction point Reason 

DRO: 0004C020H 

DR7: LO = 0, R/W0 = 11B INC DWORD PTR[01E] N Breakpoint not 

LENO = 11B enabled 

DRO: 0003A000H 

DR7: LO = 1, R/WO = 00B CS:0000 MOV AL, 37H = Y Execution breakpoint 
LENO = 00B 

DRO: 0003A001H | 
DR7: LO = 1, R/W0 = 00B CS:0000 MOV AL, 37H N Execution breakpoint 


not at first byte of in- 
struction 


LENO = 00B 
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MEMORY 
ARCHITECTURE: 
PAGING AND 
CACHE 
MANAGEMENT 


This chapter covers the paging mechanism, which is nearly identical in both the 
80386 and 80486, and the internal cache, which is present only in the 80486. Many 
computer systems built with 80386s have caches, but the cache is implemented in 
external hardware. In some 80486 machines, there will be two caches: the 8-KB in- 
ternal cache and a system cache similar to those in advanced 80386 systems. This 
chapter’s descriptions refer only to the 80486 internal cache, but 80386 users may 
still be interested in it, as the general concepts apply to any caching system. 


Paging 
Paging is used to implement virtual memory based on fixed-size blocks called 
pages. Paging is probably the most widely used virtual memory technique on to- 
day’s minicomputers and mainframes. 


Like segmentation, paging translates virtual addresses into physical addresses. 
Addresses are translated by mapping fixed-size blocks of memory into physical 
memory locations called page frames. Consider a physical memory system com- 
posed of page frames 0, 1, 2, and 3, each having 10 bytes of memory. A virtual ad- 
dress consists of a frame name and an offset, so assume that the frames have the 
names A, B, C, and D. The memory system also contains a page table for converting 
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the virtual address into a physical address. Figure 6-1 shows how virtual address C7 
is mapped into physical address 17. The arrows indicate the page mapping. 


Virtual frames Physical memory page frame 
Virtual 
7 Physical 
address > 17 pdapece 
C7---- 


Figure 6-1. Translating a virtual address to a physical address. 


Segmentation and paging are similar: A name and an offset are translated to an ad- 
dress. This mapping is the essence of virtual memory. However, segmentation and 
mapping are also different. Assume that any virtual address from the previous ex- 
ample consists of a two-digit number and that the digit in the 10’s place is the frame 
name, rather than a letter, as in Figure 6-1. A virtual memory translation would re- 
semble Figure 6-2. In this example, virtual address 27 is translated to physical ad- 
dress 17. 


Because pages have a fixed size, a virtual address can be easily separated into a 
name and an offset. A page table lookup converts every virtual address into a physi- 
cal address. 


Virtual frames Physical memory page frames 
Virtual 
__ Physical 
address Lge ae ee 
27---- 


Figure 6-2. Virtual address translation of fixed-size elements. 


Advantages and disadvantages 


A fixed page size is the key to the advantages of paging over segmentation. Because 
a disk is usually the secondary storage for a virtual memory system, you can choose 
page sizes that map well into the sector size of the disk. Paging also avoids the frag- 
mentation problem of segmentation. Every time a page is swapped out, another 
page fits exactly into the freed page frame. 


Another advantage of paging is that allocation for a large object (for example, a 
memory segment) does not have to be contiguous. An object that was contained in 
virtual pages 1 and 2 in Figure 6-2 would not be stored in consecutive physical 
memory locations. | 


Finally, paging is invisible to the programmer. Unlike segmentation, which requires 
you to know the virtual name (segment) and offset of an object in memory, paging 
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requires you to know only one address. The virtual address is broken down into its 
components by the virtual memory mechanism in the hardware. 


Paging isn’t perfect. Using paging means losing the protection rings implemented 
with segmentation. Paging is also subject to a different kind of fragmentation, called 
internal fragmentation, which occurs when you store objects that do not fit into a 
page or a sequence of pages. For example, if the page size is 10 bytes, an 11-byte ob- 
ject requires two pages, which wastes memory. 


Additionally, paging incurs more overhead than does segmentation. In a segmented 
system, the table lookups that are needed to convert a virtual address to a physical 
one occur only when a new segment is loaded. In a paged system, a virtual-to- 
physical translation must be performed for every memory access. This would not be 
an issue if the entire page table could be stored in the CPU, but processors with 
gigabyte address spaces require very large page tables. 


These problems are not insurmountable, however. You can implement a simple pro- 
tection scheme with paging alone; you can also use segmentation and paging to- 
gether. Internal fragmentation is not usually as serious as segment fragmentation, 
and the CPU’s internal parallelism and a special cache called the translation look- 
aside buffer (TLB) are used to help alleviate the page translation overhead. The TLB 
is a special-purpose cache used only by the paging unit. It exists in all members of 
the 80386 family and is not to be confused with the internal cache of the 80486. 


The Intel paging implementation 


The size of a page frame on the 80386 family is 4096, or 212, bytes. Paging is 
enabled when the PG bit of CRO is set to 1. (Once paging is enabled, usually by 
operating system software, it will probably not be disabled.) Translation treats the 
linear address generated by the segmentation unit as a virtual address and performs 
page mapping on it. Thus, memory references on the 80386 family go through the 
following stages: 


Segment:offset > linear address — physical address 


A linear address is a 32-bit value. To interpret it as a virtual address, take the high- 
order 20 bits as a frame name, and use the low-order 12 bits as an offset into the 
4096-byte page. To generate a 32-bit physical address, each entry in the page table 
must translate the frame name to a frame address. Frame address 0 corresponds to 
physical addresses 0—4095, frame address 1 identifies physical addresses 4096-8191, 
and so on. A page table entry must also provide additional page status bits for a pro- 
tection model and for swapping. Thus, a page table entry has this format: 


31 12 11 


0 
iam (oe ee ae a) Ue 
Page frame address 31. . .12 Avail AiC |W / 1] / 
D*| TF; S |W 


* 80486 only 
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The bits marked 0 are reserved for use by future Intel processors. The field marked 
Avail can be used by system programmers to mark pages that are shared among 
tasks, to hold usage information, or to store other paging data. The page frame ad- 
dress becomes the high-order bits of the physical address. The CPU sets the D 
(dirty) bit to 1 when a write operation occurs within the specified page. The CPU 
sets the A (accessed) bit to 1 when any memory access (read, write, or fetch) occurs 
within the page. 


The PCD (page cache disable) and PWT (page write-through) bits are the page- 
level equivalent of the CD and WT bits in control register 0. PCD is used to disable 
caching or cache write-through on a page-by-page basis. PWT is a “policy” bit only 
for external cache hardware; it has no effect on the processor. Setting PWT = 1 
defines a “write through” cache policy: PWT = 0 stands for “write back.” Because 
the 80386 has no internal cache, these bits should always be set to 0 in 80386 
software. | 


The U/S and R/W bits are part of paging’s protection mechanism. They are dis- 
cussed in this chapter’s “Page Protection” section. 


When the P (present) bit is set to 1, the page is present in memory. If P = 0, the page 
is assumed to be swapped to disk, and any attempt to access the page results in a 
page fault (interrupt 14). When P = 0, all other bits in the page table (31-1) are irrele- 
vant and can be used by the system programmer. Frequently, a swapped page’s loca- 
tion on disk is stored in those bits when the page is not present. 


Page tables and page directories 


Each page is 212 bytes, and physical address space is 232 bytes, so 220 (more than 1 
million) page table entries are required to implement a virtual-to-physical transla- 
tion table. Because each entry takes up 4 bytes, a page table requires 4 MB of 
memory. If a frame address alone indicated the page table entry, the page table 
would require 4 MB of contiguous memory. In a multitasking system that provides 
a separate virtual address space for each task, each task requires a 4-MB block of 
memory in addition to its code and data. . 


The solution to this space problem, swapping out the page table, cannot be imple- 
mented with a simple, one-level page table. For example, if a program tries to access 
address x, the page table entry (PTE) for x must be brought into memory. Because 
the page table is itself paged, the PTE for PTECx) must be brought into memory first. 
Swapping continues until the initial page of the page table is swapped in. 


A better solution, the one implemented by the 80386 family, is a two-level page 
table. In this scheme, the virtual name component of the virtual address (the high- 
order 20 bits) is split into two parts. The high-order 10 bits are used as an index into 
a page directory. A page directory entry (PDE) points to a scaled-down page table 
that contains 1024 entries. The 10 bits left over in the virtual address select the page 
table entries from the page table. Figure 6-3 illustrates the two-level page structure. 
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Figure 6-3. Page table/directory structure. 


This structure solves the problem of swapping out the page table because the initial 
lookup goes through the page directory. The page directory, with 1024 32-bit en- 
tries, takes up only 4 KB and is permanently stored in memory. Each page table also 
takes up 4 KB (fits right into a page!) and has 1024 page table entries. 


Register CR3 contains the physical address of the page directory for a task. CR3 is 
the only register that contains a physical (as distinct from virtual) memory address. 
A page directory entry has the same format as a page table entry except that the D 
bit is unused and the A bit is set to 1 whenever one of the page tables pointed to by 
the page directory is used. | 


A detailed example 

Figure 6-4 on the following page shows a linear address that is translated to a physi- 
cal address via paging. Assume that an instruction refers to the linear address 
13A49F01H. The frame name (13A49H) is split into a directory index (04EH) anda 
page table index (249H). The page directory is at the address specified by register 
CR3, location 1COO0H. The page directory element number 04FH is selected. It con- 
tains the value 3A7A2xxxH, where xxx represents the page status bits. If the pre- 
sent bit is set, the page table begins at location 3A7A2000H, and page table entry 
number 249H is selected. In the example, this entry contains the value 2C115xxxH, 
where xxx represents the contents of the status bits. The offset of the linear address 
is appended to the page frame to yield a physical address of 2C115F01H. 
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Linear address 13A49F01H = 00010011101001001001111100000001 B 


04EH (78,.) 249H (585,)) FO1H 


1023 dicerrt = | 1023 [__] 3a7a2rFFH 


78] 34 7A2xxxXxH 585 | 20115.xxxH 
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CR3 |_1CO00H | Physical address 


Figure 6-4. Page translation process. 


As the example shows, referring to a single memory location when paging is 
enabled requires three references: a memory read of the page directory, a read of 
the page table, and the target memory access. 


The translation lookaside buffer 


To eliminate the extra bus cycles that paging imposes on memory references, the 
paging unit contains the TLB, a content-addressable cache memory. The TLB stores 
the 32 most frequently used page table entries and page directory entries on the 
processor chip. Whenever a page table request occurs, the TLB is checked first. If 
the table entry is found (a “cache hit”), the processor translates the address with no 
additional memory overhead. More than 98 percent of all references result in a 
cache hit, leaving less than 2 percent of all memory references degraded by addi- 
tional cycles. 


The TLB is flushed whenever register CR3 is loaded with a new base address. Be- 
cause the table entries are cached on chip, maintaining page table consistency in 
multiprocessor environments is important. When one processor modifies a page 
table (that may be in another processor’s cache) or a page directory, the processor 
must signal the other processors and force them to flush their TLBs. The other pro- 
cessors must then load the modified tables. The LOCK prefix should precede any 
accesses to the page tables to eliminate simultaneous access. 
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The references to the page tables and page directories are no different from stan- 
dard memory read cycles; as such, they will go through the 8-KB internal cache of 
the 80486. Because page table hits are relatively infrequent (around 2 percent of 
references), you may wish to keep page table information out of the internal cache, 
saving cache space for application code and data. To do this, set the CD bit to 1 in 
CRO and the PCD bit to 1 in the page directory entries (but not in the page table en- 
tries). The page tables themselves will not be cached; however, the data in each 
page can be. 


Page faults 


If a page descriptor is marked not present (P = 0), a page fault Gnterrupt 14) occurs. 

When this happens, register CR2 stores the linear address that caused the fault, and 
an error code is pushed onto the stack. Page faults can also be caused by violations 

of the page protection rules, described in the next section. Chapter 5 contains addi- 
tional information about page faults in the “Interrupts and Exceptions” section. 


Page protection 

The format of a page directory entry and of a page table entry includes bits marked 
U/S and R/W. The U/S bit specifies whether a page is a user page (U/S = 1) ora 
supervisor page (U/S = 0). A supervisor page cannot be used by any procedure run- 
ning with a CPL of 3. However, a procedure with a CPL of 0, 1, or 2 can access a 
supervisor page. User pages are accessible regardless of the CPL. If a page directory 
entry is marked with U/S = 0, only a supervisor procedure can access pages in the 
page table pointed to by that directory entry, regardless of the U/S setting in the in- 
dividual page table entries. 


You can control the type of memory accesses allowed by setting the R/W bit. The 
effect of the R/W bit is modified by the WP (write protect) bit in the CRO. The 80386 
does not have the WP bit, so its operation is equivalent to an 80486 operating with 
WP = 0. In this mode, a user level program (CPL = 3) can read or execute from any 
page where U/S = 0 and can write to any page with U/S = 0 and R/W = 1. A super- 
visor level program (CPL <= 2) can read from, write to, or execute from all pages. 


In the 80486, when the WP bit is set to 1, access to pages by user level programs is 
identical to the operation described above. Supervisor level programs, however, are 
restricted to writing only to those pages with the R/W bit set to 1, regardless of the 
USS bit setting. The rules are summarized by the following formulas: 


User programs (CPL = 3) 

read_access(addr) = PDECU/S) = 1 & PTE(U/S) = 1 

write_access(addr) = read_access(addr) & PDE(R/W) = 1 & PTE(R/W) = 1 
Supervisor programs (CPL <= 2) 

read_access(addr) = TRUE 

write_access(addr) = (WP = 0)| PDE(R/W) = 1 & PTE(R/W) = 1 
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When a user level process loads a selector, issues a software interrupt, or generates 
an access to the GDT, LDT, TSS, or IDT to load a descriptor, system table reads and 
writes are treated as supervisor level accesses. Pushing values onto an inner-ring 
stack segment is also treated as a supervisor level access. If the system tables had to 
be stored in user level pages, they would be less secure than if stored in supervisor 
level pages. 


Combined paging and segmentation 


Although simulating a pure flat address space is possible in the 80386 family, most 
operating systems will probably use some segmentation. No special restrictions ap- 
ply when combining segmentation and paging, although observing certain rules can 
make life easier for the operating system designer. 


For example, segments do not need to fit into a single page or into a multiple of n 
pages; a page can contain portions of more than one segment, or vice versa. How- 
ever, memory management is easier if all segments are multiples of 4096 bytes. You 
can mark all segment limits as page granular (G = 1 in the segment descriptor), and 
each segment limit field will contain the number of pages required to hold the seg- 
ment, less one. 


To support page protection, an operating system should implement at least level 0 
and level 3 segment protection rings. This is not a problem, even in systems simulat- 
ing a flat memory architecture. All user level programs can share the same level 3 
code segment and level 3 data segment, and the operating system can use two level 
0 segments. Both sets of segments can map into the same linear address space, so 
the use of different selectors will be invisible except for the privilege level. 


Multitasking 


Operating system designers can choose to support either a single memory map (one 
for each task) or multiple memory maps (one for the system and one for each appli- 
cation). A single virtual memory space is the simplest approach, however, any sys- 
tem that supports multiple virtual 8086-mode tasks needs a different set of page 
tables for each V86 task. In V86 mode; each task accesses linear addresses 0 to 1 MB. 
A separate physical address space must exist for each linear address space. Figure 
6-5 shows how V86 tasks can be mapped to physical memory. 


The CPU architecture supports different page tables for each task by saving and 
restoring the CR3 register in the task state segment. To save itself from having one 
4-MB page table per task, an operating system can limit the linear address space of 
an application to a subset of paging’s 32-bit, 4-GB virtual memory size. 


For example, if an operating system limits each application to 8 MB of linear ad- 
dress space, it needs to manage only two page tables and the page directory. Each 
unused page directory entry is marked not present (P = 0). Trying to access an ille- 
gal memory address results in a page fault, and the operating system can tell 
whether the fault represents a swapped-out page or an illegal memory reference. 
Figure 6-6 illustrates such a system. 
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Figure 6-5. Mapping V86 tasks to physical memory. 
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Figure 6-6. Page tables required to support 8 MB of memory. 


143 


MICROSOFT’S 80386/80486 PROGRAMMING GUIDE 


Application designers should understand address space restrictions. Some operating 
systems might have a way to request a larger virtual address space with a system 
call, but others might not. 


Performance is another concern for application designers in a demand-paged sys- 
tem. A key to system performance is the size of the application’s working set. The 
working set is the number of application pages that the operating system tries to 
keep in physical memory at one time. 


For example, assume that an application is computing the sum of two arrays into a 
third array, as represented by the following program fragment: 


int a[1024], b[1024], c{1024]; 


for (1 = 0; i < 1024; i++) 
aLi] = bli] + cli]; 


The code for the program resides in one page, and each array (a, b, and c) resides 
in a separate page. If the operating system provided a working set of three pages 

per application, this program would run slowly because two pages would have to 
be swapped to disk for every for loop iteration. Figure 6-7 illustrates the swap. 
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Figure 6-7. Swapping a working set. 


Most operating systems provide working sets much larger than three pages per 
application, but applications with large memory requirements might see similar 
results. If you write an application that requires a large amount of memory, you 
might improve its performance by changing the program’s locality of reference. 


The previous program fragment needs access to many pages for every cycle through 
the loop. If this program were running under the operating system described pre- 
viously, you could increase its performance by changing the data structure so that 
ali], bli], and cli] reside in the same page. 
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struct { 
int a, b, c; 
} block{1024]; 


for (i = 0; i < 1024: i++) 
blockLi].a = blockLi].b + block[i].c: 


The program now runs with only two page swaps, as shown in Figure 6-8. 
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Figure 6-8. Reducing swapping via locality of reference. 


Application designers should consider how paging affects their programs. Although 
many designers will see no impact on their programs, others might need to modify 
code. A classic example is a program such as a LISP interpreter, which manipulates 

a large number of linked-list data structures. Unless a mechanism forces locality of 

reference on the lists, a user could end up with lists that have pointers to cells scat- 
tered throughout the address space, resulting in excessive swapping overhead. 


The Internal Cache 


The 80486 introduced an 8-KB internal cache to the processor architecture. While 
the cache can be looked at strictly as a performance aid (as distinct from a true 
architectural change), it is the cache that allows the 80486 to achieve RISC-like 
speeds. A number of instructions execute in a single clock cycle when assisted by 
the cache. 


The purpose of the cache 


The memory requirements of computers have always outweighed the processing 
requirements. The ratio of storage locations to processor is several million to one, 
even in multiprocessor systems. This means that storage costs must be kept low, or 
the price of a complete system would be forbiddingly high. To keep memory cheap, 
it is usually implemented on devices that are much slower than the main processor. 
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Disk storage is a typical example. Unfortunately, using only such slow devices 
negates the value of having a fast processor, because the CPU spends all its time 
waiting for data to be read from or written to memory. 


One solution to this problem is to provide more than one kind of memory: A very 
fast memory for the most important stuff, and a slower memory for the stuff that’s 
not currently in use. We are all familiar with this setup. The familiar CPU/RAM/disk 
triad exemplifies this model. Current RAM technology, however, still lags CPU per- 
formance, at least at reasonable data densities. The cache is simply another variant 
of this model; the initial accesses are to the fastest memory and the cache; then sys- 
tem RAM is used, and then disk storage. 


To make effective use of very fast memory in the cache, it is necessary to reduce the 
problem of data density; the 80486 cache is only 8 KB. Because we can’t have a lot 
of cache memory, we’ll make that memory smarter. 


An intelligent RAM 


In a standard memory system, the CPU presents an address, and the memory system 
returns the data stored at that address. Because a cache is small (and can’t store all 
the data we'd like it to), its behavior is a little different. The cache can be looked 
upon as storing a set of ordered pairs, in the form (address, value). The CPU pre- 
sents the cache with a memory address. The cache looks through all its ordered 
pairs. If it finds an address match, it returns the associated value; otherwise, it 
passes the address to the standard memory system on the bus. When the value 
comes back from memory, the cache will store it in case the processor requests the 
value again. This process is illustrated in Figure 6-9. Finding a match in the cache is 
called a cache “hit,” and it eliminates the need to access the slower system RAM. 


Value 
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! Address (b, vallb})) 
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Figure 6-9. Memory fetch with cache. 
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Memory writes are handled in a somewhat different manner. Writes always go to 
system RAM because system memory must contain the correct values if the cache is 
ever disabled. First, however, the cache is checked for an address match. If a match 
occurs, then the cache value is updated. If no match occurs, the cache remains 
unchanged and only system RAM is updated. 


When new data is brought into the cache by a read or fetch cycle, it usually means 
that some other data must be disposed of. The cache checks to see which addresses 
have been accessed least frequently and replaces the least recently used ordered 
pair with the address and data just read from system RAM. This assures that tight 
program loops and frequently referenced variables will be accessed as quickly as 
possible. 


Cache lines and associativity 


Because of the way that a cache works (using a lookup-by-association technique), 
caches are sometimes referred to as associative memories. The amount of memory 
required to store the address portion of an ordered pair is not taken into account 
when determining the size of the cache. Thus, the 8-KB cache of the 80486 means 
that there is room for 8 KB of data values. 


In fact, to speed operation of the cache, reduce the amount of memory required for 
the address portion; and to decrease chip complexity, the full 32 bits of the memory 
address is not stored. Instead, the cache is organized into sets and lines. 


A cache line in the 80486 is simply 16 bytes of data. Whenever a cache miss occurs, 
the cache loads the entire 16 bytes, beginning at address AND FFFFFFOH, from sys- 
tem memory. The 80486 bus supports a special “burst mode” expressly for this pur- 
pose. This means that the cache need not store the low-order four bits of memory 
addresses, because the entire 16 bytes is present in the cache. Loading an entire line 
also has the advantage of “prefilling” the cache, on the assumption that memory ad- 
dresses are frequently localized and often sequential. 


Notice that the 80486 will always fill an entire line. If, for instance, you accessed the 
byte at location 3A75H, and there was a cache miss, the processor would start a 
burst read at 3A70H and cache the 16 bytes through 3A7FH. 


Cache control 
A number of factors influence whether or not a line of data will be cached. Initially, 
there is simply the question “Is the cache enabled?” to contend with. Bits 29 and 30 
of control register 0 control the cache on a global basis. After the cache has been 
enabled by setting bits 29 and 30 to 0, it can be flushed by a hardware signal, 
disabled on a page-by-page basis i in software, or disabled on a line-by-line basis in 
hardware. : 


The hardware signal FLUSH\ is asserted when external hardware wants the 80486 
to invalidate all current cache lines. Caching remains enabled, but all current data is 
marked invalid. This is useful in multiprocessor systems with shared memory. 
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When one processor writes to shared memory, it can force the other processor to 
flush their caches, ensuring that they read the fresh data. 


The page table entries contain mask bits for the CD and NW bits in CRO. By masking 
these values, individual memory pages can be marked as never cacheable. In IBM 
PC-compatible systems, for example, the video memory locations are almost never 
read, and it is more efficient to prevent them from being cached altogether. 


Finally, external hardware can detect that certain addresses should not be cached. 
Addresses of memory-mapped I/O devices, for example, should not be cached. Ex- 
ternal hardware can use the KEN\ line to enable caching or to ensure that data at a 
particular address is not cached. 
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In earlier chapters I alluded to the capability of Intel microprocessors to run soft- 
ware written for previous processor generations. This chapter explores this capa- 
bility in the 80386 and the 80486 and discusses how to make the most of it. 


The 80386 and the 80486 provide an almost ideal upgrade path from the 8086 and 
80286 families of Intel processors. In real mode, the new 32-bit machines can run 
8086-family programs. They can switch into protected mode and execute 80286 
software. The native mode of the 80386 and the 80486 expands the protected-mode 
capabilities with 32-bit operations and eliminates the 64-KB segment restrictions of 
the 80286. Virtual 8086 mode lets you run real-mode programs in protected mode; 
this is advantageous because many more real-mode applications are currently avail- 
able than protected-mode applications. With the release of Windows 3.0 and OS/2 
V2.0, however, this situation is almost certain to change in the 1990s. 


| Real Mode 


When the 80386 or the 80486 is powered up or reinitialized via the hardware 
RESET\ line, the CPU is in real (real-address) mode. In real mode, all of the CPU’s 
protection features are disabled, paging is not supported, and program addresses 
correspond to physical memory addresses. The address space is limited to 1 MB of 
physical memory. Real mode is compatible with the 8086, the 8088, the 80186, the 
80188, and real mode of the 80286. Minor differences in real mode among the 
various processors are listed in Appendix F. 


When the processor is reset, the registers are initialized to the values shown in the 
table on the following page: 
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Register Value Explanation 

1 | 3or4 3 for 80386, 4 for 80486 

DL <id> Identifies revision number of CPU 
EFLAGS 2 


IDTR (See “Interrupt 0 (base), 3FFH Cimit) 
Processing,” below) 


CS FOOOH Descriptor base set to FFFFOOOOH 

IP FFFOH First instruction at FFFFFFFOH 

SS 0 Base address 0 

ESP ? Undefined, load SS:ESP before using 
stack 

DS 0 Base address 0 

ES 0 Base address 0 

FS 0 Base address 0 

GS 0 Base address 0 

CRO(80486) 60000000H Cache disabled 

CRO(80386) 000000..0H Bit 4 = 1 if 80387 present, 0 otherwise. 
Bits 5-31 are undefined 

Memory addressing 


Shadow registers (segment descriptor caches) provide a key to understanding real- 
mode memory addressing. Each segment register that holds a selector has an in- 
visible component called a shadow register. In protected mode, every time a selec- 
tor is loaded into a segment register, the contents of the descriptor indicated by the 
selector are loaded into the shadow portion. In real mode, the shadow register is 
loaded with a computed value rather than with a value extracted from a descriptor. 
Figure 7-1 illustrates the shadow registers. 


When the processor is reset, the shadow registers for segments other than CS are 
loaded with a base address value of 0 and a limit of OFFFFH, with attributes set to 

16-bit addressing; 16-bit instruction set; read, write, and execute ability; and privi- 

lege level 0. The CS shadow registers are set with the same limit and access bits as 


Visible portion “Invisible” descriptor cache 


Access 
15 0 Base Limit rights 


Segment registers 


| | Programmer accessible 


gz Not accessible 


Figure 7-1. Shadow registers. 
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the other shadow registers but have a base address of FFFFOOOOH. Except for the 
registers listed in the above table, 80386-family registers are undefined. There is 
one exception to this in the 80486. If the chip’s built-in self test (BIST) has been 
enabled at reset time (by activating pin AHOLD during the falling edge of the 
RESET signal), register EAX will be set to zero if the BIST completed successfully. 


At reset, the limit portions of the shadow registers are set to OFFFFH, which indi- 
cates a 64-KB segment. The access rights portion is set to a value indicating that the 
segment is readable, writable, and executable and that 16-bit addressing and 
operand modes are enabled. These values remain constant while the processor is in 
real mode, and only the base address value is altered. Each time a segment register 
is loaded, the base address portion of the shadow register is set to 16 times the value 
of the selector. For example, loading DS with the value of 001AH sets the base ad- 
dress of the DS segment to 01A0H. Because all the segments in real mode. are 64 KB, 
the segment addressable via DS extends from 01A0H to 1019FH. Figure 7-2 illustrates 
physical address generation in real mode. 


The highest segment base address that can be generated in real mode is OFFFFOH, 
16 bytes short of 1 MB. Because that segment extends for 64 KB, memory beyond 1 
MB can be addressed. Thus, 32-bit real-mode addressing is somewhat incompatible 
with that of the 8086, which hardware address lines limit to 1 MB. Generally, this 
limitation can be ignored because 8086 programs do not use it. If needed, external 
hardware can be added to limit system address space in 80386 systems to 20 bits 
while the system is operating in real mode; in 80486 systems, activating pin A20M\ 
forces address-space wraparound of 1MB. 


The reset state of the CS shadow register does not follow the “selector times 16” 
rule. Because the initial base address for the code segment is set to FFFFOOO0H, 
ROMs that handle processor reset can be placed at the end of the address space. 
The first CALL or JMP instruction that loads CS after reset forces the base address 
into the first megabyte of address space. 
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Figure 7-2. Real-mode addressing. 
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16-bit instruction set 


The predefined shadow register values cause another side effect. The D bit in the 
access rights field is always set to 0 in real mode. Thus, an 80386 or an 80486 is 
forced to operate in 16-bit mode unless it encounters an OPSIZ or ADRSIZ prefix. 


To understand how the D bit works, examine the 8086 instruction set. Most 8086 in- 
structions execute with either a byte operand or a word operand. The byte/word 
indicator is encoded in one bit in the instruction. For example, the opcode for negat- 
ing a byte operand is 11110110B, and the opcode for negating a word operand is 
11110111B. 


Rather than invent new opcodes for 32-bit (dword) operands, Intel’s designers 
changed the meaning of the opcode bit that signifies a word operand. When exe- 
cuting in a native-mode (32-bit) segment, where the D bit in the segment descriptor 
is set to 1, executing opcode 11110110B means negate byte and 11110111B means 
negate dword. The instructions refer to bytes and dwords rather than to bytes and 
words. When the D bit of a descriptor is set to 0, however, the opcodes retain their 
original meanings. 


The D bit also affects address computation for memory operands and the stack. 
When D = 0, corresponding to the 8086, the 16-bit registers are used in calculating 
segment offsets, as in MOV AL, [SI+8]. When D = 1, corresponding to the 32-bit na- 
tive mode, the same opcode bits cause the memory address to be calculated using 
the 32-bit registers, and the instruction becomes MOV AL, [ESI+8]. When D = 0 in 
stack segment descriptors, PUSH and POP instructions access 16-bit operands. 
When D = 1, 32-bit pushes and pops are executed. 


The OPSIZ and ADRSIZ prefixes can override the current D-bit setting for an 
instruction. Thus, 32-bit native-mode instructions can be prefixed to use 16-bit 
operands, and 16-bit code can be prefixed to access 32-bit operands and 32-bit ad- 
dressing modes. The extended addressing features (such as indexing) are not avail- 
able in segments that have the D bit set to 0 unless the ADRSIZ prefix is used. Note: 
You need not explicitly specify the prefix instructions; use extended-addressing 
mode, and the assembler will insert the prefix. 


When using extended addressing in real mode, observe the 64-KB segment size 
limitation. In real mode, address offsets greater than 65535 return an interrupt 13. 


Interrupt processing 


Interrupt handling is different in real mode than in protected mode. As in protected 
mode, the IDTR contains the base address and limit of the interrupt table. For 8086 
compatibility, the base is initialized to physical address 0 with a limit of 3FFH. In 
real mode, however, the interrupt table does not hold descriptors, each interrupt has 
a 32-bit selector:offset address that points to the routine to be invoked when an in- 
terrupt occurs. Thus, each entry is 4 bytes rather than 8 bytes. Figure 7-3 illustrates 
the real-mode interrupt vector table. 
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Physical memory 


31 16 15 ) 


Figure 7-3. Real-mode interrupt vector table. 


The processing of an interrupt in real mode is similar to that in protected mode except 
for the use of vectors instead of descriptors. A software or hardware interrupt causes 
the 16-bit FLAGS register to be pushed onto the stack, followed by the current CS and 
IP. The IF and TF flags are cleared to 0, disabling interrupts and single-stepping. 


The pointer from the interrupt table is loaded into CS and IP, and processing continues 
at the new location. Automatic task switching and interrupt gates are not present be- 
cause no descriptor tables exist in real mode. The vector in the interrupt table specifies 
a new execution address only. 


Real-mode restrictions 


You can use all the instructions added to the architecture since the introduction of 
the 8086, with the exception of: 


INVD* LSL VERR 
INVLPG* LTR VERW 
LAR SLDT WBINVD* 
LLDT STR 

*80486 only 


Real mode does not support the ways that these instructions access protected-mode 
selectors, descriptors, or tables. Executing one of these instructions causes an un- 
defined opcode fault Ginterrupt 6). 


You can execute all other 16-bit and 32-bit instructions. Real-mode programs can 
access any register, including the control, debug, and test registers. 


Real mode does not support paging. Setting the PG bit in register CRO to enable pag- 
ing causes a protection fault. 


Appendix F outlines the differences among the operations of the 8086, the 80286 in 
real mode, the 80386, and the 80486. 
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Protected Mode 


Setting the low-order bit of CRO to 1 switches the processor into protected mode. 
The processor will run in protected mode even if no setup is done—that is, it will 
run until the first interrupt, FAR program transfer, or segment register load. At this 
point, the processor needs to access a descriptor table. Because the protection 
mechanism depends on descriptor tables, the system will shut down if the descriptor 
tables have not been initialized. 


Protected-mode initialization requires you to set up a global descriptor table and 
interrupt descriptor tables and to create a task state segment for the first process. 
The initial descriptor tables can be stored in ROM, but they must be copied to RAM 
before you set the GDTR and IDTR to point to them because the CPU needs to write 
to the descriptors as well as read from them. | 


Figure 7-4 shows a simple initial GDT. This GDT would be sufficient to run addi- 
tional startup code. You could also build the operating system image in real mode 
and then switch into protected mode. An advantage of switching into protected 
mode as soon as possible after reset is that the hardware can help trap startup bugs 
early in the code development cycle. 


In Figure 7-4, GDT(O) is unused because a selector value of 0 is treated as a special 
case, a NULL pointer. Thus, any descriptor at GDT() will never be used. GDT() 
points to the GDT as a writable data segment, allowing the operating system to add, 
delete, and change descriptors as needed. GDT(2) points to the IDT as a writable 


Physical memory 


Figure 7-4. A simple GDT. 
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data segment for the same reason. GDT(3) defines the TSS for the startup task, 
GDT(4) defines the task’s data segment, and GDT(5) defines the task’s code seg- 
ments, which are in ROM. 


Before you enable protected mode by setting the PE bit, the GDTR must be loaded 
with the address and limit of the GDT. The IDT should contain gates that point to 
code and that trap any faults that occur during startup. The IDTR is initialized to 
point to the IDT, and TR is loaded with the selector of GDT(3). The PE bit is then set 
in the CRO register to enable protected mode. Next, a FAR jump instruction loads 
the CS register with a valid protected-mode descriptor. Finally, the stack segment, 
stack pointer, and data segment registers are loaded. The initialization will build the 
rest of the operating system, enable paging, and start application programs. 


80286 compatibility 


Protected-mode 80286 code executes on the 80386 or the 80486 if the fourth word 
of each descriptor is initialized to 0. Descriptors are 64 bits on all three processors, 
but the high-order 16 bits are unused on the 80286. On the 80386 and the 80486, the 
extra bits specify the high order of the base address and the limit fields and contain 
the G and D control bits. These new fields should be set to 0, restricting segment 
limits to 64 KB and activating the 16-bit instruction set (which is compatible with 
the 80286). 


The 80286, the 80386, and the 80486 operate similarly; the few differences in opera- 
tion concern performance and newly implemented features and instructions. The 
80386 and the 80486 allow the LOCK prefix to precede the following instructions 
only when they modify memory: 


ADC BIC INC SBB 
ADD BIR NEG SUB 
AND BTS NOT XCHG 
BT DEC OR XOR 


Illegal use of the LOCK prefix results in a protection fault on the 80386 or the 80486. 
Additionally, the 80286 locks all of physical memory during the instruction; on the 
80386 and the 80486, the locked area is the memory region with the same starting 
address and length as the operand of the locked instruction. — 


The machine status word (MSW) is the low-order 16 bits of register CRO. The MSW 
is initialized to OFFFOH on the 80286, but it is initialized to 0 on the 80386 and the 
80486. Registers that are specified as undefined at reset might have different values 
than they do on the 80286. 


At reset, the base address of the CS register is different on the 80386 and the 80486 
than on the 80286. The CS register is set to the same logical location—that is, to the 
last 16 bytes of the address space—but the 80286 supports only 24-bit addresses, 
whereas the 80386 and the 80486 support 32-bit addresses. 
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Returning to real mode 


In general, an operating system should not switch to real mode after running in pro- 
tected mode. Returning to real mode compromises operating system security be- 
cause real mode is more vulnerable to crashes. To run real-mode programs while in 
protected mode, create special tasks that run in virtual 8086 (V86) mode. The next 
section discusses this process. 


If you must return to real mode, follow this procedure: If paging is enabled, turn it 
off by branching to a routine whose linear and physical addresses are the same, 
clearing the PG bit in CRO, and moving 0 into CR3 to clear the PDBR (page directory 
base register), which will also flush the TLB. 


The attribute bits in each segment descriptor must be set to values compatible with 
real-mode operation—that is, they must be byte granular segments with a limit of 
OFFFFH, and the B and D bits must be 0. CS must be marked “executable,” and SS, 
DS, ES, FS, and GS should be writable segments. (Change the CS selector by issuing 
a FAR jump or call instruction.) 


Disable interrupts, and load the IDTR with a base address of 0 and a limit of 3FFH. 
Clear the PE bit of the CRO register to return to real mode, and execute a FAR jump 
to flush the instruction queue and initialize CS to a valid real-mode base address. 


After you load the stack pointer (SS:SP) and the other segment registers, programs 
can continue processing in real mode. 


Virtual 8086 Mode 


Just as virtual memory allows the processor to create the impression of memory that 
isn’t really there, virtual 8086 mode allows the 80386 and the 80486 to create the il- 
lusion of multiple 8086 processors. This illusion is so nearly complete that multiple 
8086-based operating systems can run under a supervisory protected-mode operat- 
ing system. For example, assume that the native-mode operating system for an 
80386 computer is UNIX and that support for V86 mode is built in. In addition to 
running multiple UNIX tasks, the user can run a copy of MS-DOS and a word pro- 
cessor in a V86 window. The user can also invoke another virtual 8086 session run- 
ning a spreadsheet under Windows. Each V86 task believes that it is running ona 
separate 8086 machine but actually runs concurrently with host operating system 
tasks. 


V86 mode was designed in response to the negative reaction to 80286 protected 
mode. Application designers developed a large software base for the 8086 family 
under MS-DOS. The 8086 and 8088 processors support only real-mode program- 
ming, and MS-DOS is sensitive to the mapping between selector values and physical 
addresses. When Intel introduced the 80286, developers found that MS-DOS pro- 
grams had problems running in protected mode. 


If MS-DOS were less sensitive to physical addressing, most applications could be 
easily ported to 80286 protected mode. Operating systems such as Concurrent CP/M 
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and Microsoft Windows created environments that relied less on the idiosyncrasies 
of real mode, but because of DOS’s wide popularity the marketplace demanded 
support of real mode. 


V86 mode was Intel’s response. V86 mode is available in the 80386, the 80386SX, 
and the 80486. The paging and multitasking capabilities of these processors enabled 
designers to implement V86 mode, which overcomes the 1-MB nonprotected limita- 
tions of real mode. Because a TSS contains an image of all the general registers, it is 
the basis of a register image for a virtual machine (in this case, an 8086). Addition- 
ally, the TSS contains the extra information needed for protected mode: the inner- 
ring stack pointers and the page directory base register (CR3). The operating system 
creates a V86 task by setting the VM bit in the EFLAGS image of the task’s TSS. 


When a task is invoked and the EFLAGS register is loaded (setting the processor’s 
VM bit), the task’s code portion behaves as if it were running in real mode. The task 
does not use descriptors; base addresses are generated by multiplying the selector 
value by 16. The difference between real mode and V86 mode is that real-mode ad- 
dresses are physical addresses and V86 mode addresses are linear addresses that 
can be mapped via paging hardware. 


Thus, the executing program makes the same assumptions about selectors and ad- 
dresses that a real-mode program does, but the paging hardware, under control of 
the native-mode supervisor, controls which physical addresses are used by the V86 
task. The entire 4-GB address space is available for remapping the V86 task’s ad- 
dresses. The other issue that Intel’s designers had to face was the integration of real- 
mode programs into a secure, protected-mode environment. 


Memory references were not a problem. The paging hardware can isolate the V86- 
mode program address space from protected-mode programs, preventing data cor- 
ruption. Besides memory, the only external interfaces to the CPU are I/O ports and 
interrupts. 


I/O in V86 mode 


In protected mode, the I/O privilege level GOPL) determines whether a procedure 
can perform I/O instructions. In V86 mode, IOPL protects the interrupt flag (IF), 
and I/O port protection is performed through the I/O permission bits in the TSS. 
V86 mode programs run in ring 3; thus, they cannot alter the value of IOPL. 


The CPL of a V86 mode task is always 3. If the system IOPL is less than 3, the in- 
structions below return a general protection fault (interrupt 13) with an error code 
of 0. I/O instructions are not IOPL sensitive in V86 mode. 


CLI POPF 
INT | PUSHF 
IRET STI 
LOCK 
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If the system runs with an IOPL of 3, the V86 mode task will execute the instruc- 
tions above without triggering the general protection fault. This creates a problem 
because these instructions modify the interrupt flag. Although performance might 
be higher when IOPL = 3, this operating mode is not recommended. Allowing a 
V86-mode task to disable interrupts could result in data loss or a system shutdown. 
For example, the following two-line assembly program locks the system and re- 
quires a complete power cycle to bring the system back on line: 


cli 
ll: jmp 11 


Designing a reliable system that runs V86-mode tasks with IOPL = 3 requires hard- 
ware support and cannot be implemented with software alone. For example, a 
watchdog timer can be connected to the NMI interrupt, forcing control back to the 
operating system if an application appears to have crashed the system. 


The I/O permission bitmap of the V86 task state segment determines whether the 
I/O instruction executes or causes an exception. Figure 7-5 illustrates a typical I/O 
permission bitmap in a V86 task state segment. 


FFFFFFFF 
FFFFEFFE 
_..00001F00... | 


OP bitmap base] | 


100 


Figure 7-5. 1/O permission bitmap. 


A trade-off exists between performance and protection. If you allow all tasks to 
issue I/O instructions, more than one task might access a device simultaneously. 
However, if you trap all I/O instructions, programs might run slowly. A compromise 
is to mark I/O address space as inaccessible until the first fault occurs. By trapping 
the first I/O instruction to a given port, the operating system can determine 
whether another task is using the device. If not, the permission bits for the faulting 
task can be modified to grant access to the specific device, and the task can resume 
processing at full speed. If some other task is accessing the device, the faulting task 
can be suspended or terminated. 


Memory-mapped devices must be controlled through paging hardware. Pages that 
correspond to device addresses can be marked “not present” to cause a fault, or 


158 


7: Three in One 


they can be mapped to other devices or memory locations for subsequent process- 
ing. (The latter is effective for display devices.) 


Interrupt handling in V86 mode 


Because V86 mode is part of the protected-mode environment, interrupts are 
handled through the standard protected-mode IDT. The interrupt causes the pro- 
cessor to switch to an inner-ring stack segment. The stack segment’s selector is 
taken from the TSS and is a standard protected-mode selector, as opposed to the 
value of SS that the V86 mode task is using. Hardware interrupts are fielded by the 
routines or tasks designated by the gates in the IDT. Software interrupt instructions 
in the V86 task usually refer to routines in the virtual machine operating system; 
they are unlikely to correspond to the vectors implemented by the supervisory 
operating system. Therefore, any operating system that supports V86 tasks must be 
aware of two possible outcomes of a software INT instruction executed by a V86 
mode program. 


The more likely outcome is a general protection fault Gnterrupt 13). Because V86 
tasks execute at privilege level 3, accessing a more privileged ring’s descriptor 
causes a general protection fault. The interrupt 13 fault handler must detect when it 
has been invoked due to a software interrupt instruction from a V86 task. 


The error code on the stack indicates the vector that caused the general protection 
fault. The handler can fetch the contents of the V86 interrupt vector from the V86 
task image and branch back to the V86 routine. 


A less likely outcome occurs only when IOPL = 3 and when the gate in the IDT has 
a level 3 descriptor. In this case, the software interrupt causes a branch to the rou- 
tine pointed to by the gate. This routine must be in ring 0 to prevent a general pro- 
tection fault. Any interrupt routine that can be invoked by a level 3 gate in the IDT 
must examine the VM bit in the EFLAGS image on the stack to determine whether 
the interrupt handler was invoked by a standard protected-mode routine or by a 
V86 task. 


Whenever an interrupt occurs while the processor is executing a V86 mode task, 
control moves to a ring 0 code segment. Control may transfer directly to ring 0, or it 
may transfer to the general protection fault handler (which must be in ring 0). The 
ring O stack is slightly different when control comes from a V86 task than when it 
comes from a protected-mode procedure. All segment registers are pushed onto the 
ring 0 stack when an interrupt or a trap occurs in a V86 task. Figure 7-6 on the fol- 
lowing page illustrates the differences in the stacks. Notice that an error code will 
also be pushed for certain exception interrupts. 


In addition to the extra values pushed onto the stack, all segment registers are 
reloaded during the transition through the gate. DS, ES, FS, and GS are loaded with 
a null selector (0), SS is loaded from the ring 0 stack selector in the TSS for the V86 
task, and CS is loaded with the descriptor from the interrupt or task gate. 
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31 0 
0 | Task SS 


SS:ESP SS:ESP 
from TSS from TSS 
Real-mode 
ESP—>| IP selectors 


Interrupt stack after Interrupt stack after 
transition to ring 0 transition to ring 0 
in protected mode in V86 mode 


Figure 7-6. Ring O interrupt stacks: protected mode vs. V86 mode. 


The segment registers must be loaded with new values if the executing task is a 
V86 task. Before an interrupt, the segment registers contain real-mode style segment 
addresses, which are not valid selectors for the protected-mode interrupt handler. 
When the interrupt handler returns via the IRET instruction, the CPU checks the 
saved EFLAGS image in the level 0 stack. If the saved VM bit is set, the CPU recog- 
nizes that it is returning to a V86 mode task and reloads the segment registers with 
the saved values on the stack. 
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REFERENCE 
SECTION 


This chapter of Microsoft's 80386/80486 Programming Guide provides a reference 
for the instruction sets. The instructions are in alphabetic order, with floating-point 
instructions following the basic instructions. 


The experienced user can find information with a quick glance at the first part of an 
instruction; a less experienced user can refer to the detailed descriptions and 


examples. 
Operators 
The following reference pages use these operators: 
Operator Meaning Operator Meaning 
+ Addition & Boolean AND 
_ Subtraction > Greater than 
* Multiplication < Less than 
+ Division >> Shift right 
~ Not << Shift left 
= Equal to < Less than or equal to 
[= Not equal to > Greater than or equal to 
Or - Assignment 
A Exclusive OR 
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MNEMONIC. 
Used by the assembler to 
represent the instruction. 


NAME. 
Name of instruction 


QS 


PROCESSOR TYPE. 
Processors that support 
the instruction. Note that 
earlier processors 
supported only 8-bit or 
16-bit forms. 


8: Reference Section 


8086/80186/80286/80386/80486 
(16p/32) 


CALL offset ; EIP — EIP + offset 


CALL 

SYNTAX. Near Procedure Call 
Generic 
: ‘ Syntax 
instruction Cit. dase 
I ormal. Operation 

push(EIP) 
OPERATION. EIP « dest 
Pseudocode Legal Forms 
operation dest 
description. CALL mem =; EIP & {mem] 

CALL reg ; EIP — [reg] 
DESCRIPTION. Description 


Description of 
the instruction. 


This instruction pushes the address of the next instruction (EIP) onto the stack. The 
instruction pointer is then set'to the value specified by the operand. 


If the operand is an immediate value, the new instruction pointer is relative to the 


current position. If the operand is a memory address or a register, the subroutine 


FAULTS. 
Faults that may 


Flags 
OF DF IF TF SF ZF 


be triggered by 

the instruction. 

The abbreviations Faults 

used include: =m 7 
#UD (j undefined opcode) 13 =GPO) INT 13 
#NP (not present) 17 *ACi0) 

#TS (task switch) Examples 

#GP (general protection) CALL SORT. 

#88 (stack fault) “ corso 
#PF (page fa ult) CALL CLEBX+EAX*4] 


#AC (alignment check); 80486 only 
A value in parentheses 
indicates that an 
error code is pushed 
onto the stack. 


EXAMPLE. 
Code that 
illustrates 
use of the 
instruction. 
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address is taken indirectly from the operand. 


PF 


CF 
Raabe oe eee 


V8086 


#*GP(0) 
#PF(ec) 
#AC(O) 


: Call direct 

: Get pointer to address table 
+ Select third function 

3 Call it 
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OPERAND SIZES. 
When many different 
operands may be used, 
this field indicates legal 
sizes. If the instruction 
requires more than one 
operand, they are 
assumed to be the same 
size. Unless otherwise 
stated, 8 = 8-bit 
operands; 16 = 16-bit 
operands; 32 = 32-bit 
operands; 16p = The 
instruction accepts 16-bit 
operands by using the 
32-bit form and the 
OPSIZ instruction prefix. 


LEGAL FORMS. 

Legal forms of the 
instruction. reg = one of 
the general registers 
EAX, ESI, BX, DL, BP, 
DX, etc. mem =a 
memory operand 
[021AH], [EBP + EAX + 3], 
[ECX+ 7], etc. idata = an 
immediate data value 
(32, 17A3H, etc.) sreg = 
a segment register. offset 
= an offset from the 
current CSIP. 


FLAGS. 

OF = Overflow flag. 

DF = Direction flag. 

IF = Interrupt enable flag. 
TF = Trap flag. 

SF = Sign flag. 

ZF = Zero flag. 

AF = Auxiliary flag. 

PF = Parity flag. 

CF = Carry flag. 

An “x” in a box indicates 
that the specified bit is 
modified by the instruc- 
tion. An “-” in a box 
means that the specified 
bit value remains 
unchanged. A “?”’ means 
that the instruction sets the 
flag to an unknown value. 
Ifa “O” or “1” is in a box, 
the instruction sets the 
specified bit to that value. 


8: Reference Section 


AAA 8086/80186/80286/80386/80486 
ASCII Adjust After Addition (8) 
Syntax 
AAA 
Operation 


if (AF | (CAL & OFH) > 9)) then 
AL — (AL + 6) & OFH 
AH «— AH + 1 
CF, AF « 1 
else 
CF, AF < 0 
endif 


Legal Form 
AAA 


Description 


This instruction ensures that an ASCII or BCD addition results in a valid BCD digit. 
After executing an ADD or ADC instruction that leaves a single BCD or ASCII digit 
in register AL, execute AAA to produce a valid BCD result.. 


If the value in AL produces a decimal overflow, the BCD digit is forced into the legal 
range (0-9), and AH is incremented. The high-order nibble is zeroed so that AL 
contains only the resulting single BCD digit, and the AF and CF flags are set to 1. 


If no overflow occurs, the AF and CF flags are reset to 0. 


Flags 
OF DF IF TF SF ZF AF PF CF 


Faults 

None. 

Example 

MOV AL, ‘5° ; Binary 35H 

ADD AL, ‘7’ ; Add binary 37H yielding 6CH 

AAA ; AL — 02H, AH <— AH + 1, decimal carry set 


OR AL, 30H -; Convert resulting digit to ASCII .‘2' 
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AAD 8086/80186/80286/80386/80486 
ASCII Adjust Before Division (16) 


Syntax 
AAD 


Operation 


AL — AH * 10 + AL 
AH <— 0 


Legal Form 
AAD 


Description 


This instruction supports BCD division. Before execution, the AL register should 
contain a single, unpacked BCD digit. The AH register should hold the next higher- 
order BCD digit. After executing the AAD instruction, AX contains the binary 
equivalent of the two BCD digits. You can then issue the divide instruction, which 
leaves a binary result. 


Flags 
OF DF IF TF SF ZF AF PF CF 


Faults 

None. 

Example 

MOV AH, ‘'4' ; High-order digit 

MOV AL, ‘2' ; Low-order digit (AX = ASCII 42) 

AND AX, OFOFH ; Convert to unpacked BCD 

AAD ; AX <— 2AH (42 decimal) 

MOV BL, 6 ; Divisor for 42/6 

DIV BL ; AL — 7(quotient), AH <— O(remainder) 
OR AL, 30H ; Convert result to ASCII '7' 
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AAM 8086/80186/80286/80386/80486 
ASCII Adjust After Multiplication (8) 
Syntax 
AAM 
Operation 


AH <— AL div 10 
AL — AL mod 10 


Legal Form 
AAM 


Description 

The AAM instruction converts the result of a single-digit BCD multiplication (a 
value 0-81) in the AX register to two unpacked BCD digits, the high-order digit in 
AH and the low-order digit in AL. 


Flags | 
OF DF IF TF SF ZF AF PF CF 


Faults 

None. 

Example 

MOV AL, 4 ; Multipland 

MOV AH, 8 ; Multiplier 

MUL AH ; AX < 20H, 32 decimal 
AAM ; AH < 3, AL € 2 

OR AX, 3030H ; Convert to ASCII ‘'32' 
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AAS 8086/80186/80286/80386/80486 


ASCII Adjust After Subtraction (8) 
Syntax 
AAS 
Operation 


if (AF | (AL & OFH) > 9) then 
AL <& (AL - 6) & OFH 
AH <— AH - 1 
CF, AFe l 
else 
CF, AF « 0 
endif 


Legal Form 
AAS 


Description 


This instruction ensures that an ASCII or BCD subtraction results in a valid BCD 
digit. After executing a SUB or SBB instruction that leaves a single BCD or ASCII 
digit in register AL, execute AAS to produce a valid BCD result. 


If the value in AL produces a decimal borrow, the BCD digit is forced into the legal 
range (0-9) and AH is decremented. The high-order nibble is zeroed so that AL 
contains only the resulting single BCD digit, and the AF and CF flags are set to 1. 


If no borrow occurs, the AF and CF flags are reset to 0. 


Flags 
OF DF IF TF SF ZF AF PF CF 


Faults 

None. 

Example 

MOV AL, ‘5’ ; 35H 

SUB AL, °7' ; Subtract 37H yielding OFEH 

AAS ; AL <— O8H, carry set indicating "borrow" 
OR AL, 30H ; Convert result back to ASCII '‘'8' 
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ADC 8086/80186/80286/80386/80486 
Add with Carry (8/16p/32) 
Syntax 


ADC dest, src 


Operation 


dest <— dest + src + CF 


Legal Forms 


dest sre 
ADC reg, idata 
ADC mem, idata 
ADC reg, reg 
ADC reg, mem 
ADC mem, reg 
Description 


This instruction adds the contents of the dest and src operands, increments the 
result by 1 if the carry flag is set, and stores the result in the location specified by 
dest. The operands must be of the same size. If the operands are signed integers, the 
OF flag indicates an invalid result. If the operands are unsigned, the CF flag indi- 
cates a carry out of the destination. 


Flags 
OF DF IF TF SF ZF AF PF CF 


Faults 
PM RM V8086 
12 #SS(O) | 
13 #GP(O) INT 13 #GP(O) 
14 ¥#PF(ec) #PF(ec) 
17 #ACQO) #AC(O) 
Example 
; Subroutine to add two 64-bit integers 
ENTER 0, 0 ; Create stack frame 
MOV EAX, [LEBP+8] ; Get low-order of first value 
MOV EDX, [EBP+12] ; Get high-order of first value 
ADD EAX, [LEBP+16] ; Add low-order bits, generating carry 
ADC EDX, [LEBP+20] ; Add high-order bits with previous carry 
LEAVE ; Undo stack frame 
RET ; Return with value in EDX:EAX 
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ADD 8086/80186/80286/80386/80486 
Integer Addition | (8/16p/32) 
Syntax 


ADD dest, src 


Operation 


dest — dest + src 


Legal Forms 


dest Src 
ADD reg, idata 
ADD mem, jidata 
ADD reg, reg 
ADD reg, mem 
ADD mem, reg 
Description 


This instruction adds the contents of the dest and src operands and stores the result 
in the location specified by dest. The operands must be of the same size. If the 
operands are signed integers, the OF flag indicates an invalid result. If the operands 
are unsigned, the CF flag indicates a carry out of the destination. If the operands are 
unpacked BCD digits, the AF flag indicates a decimal carry. 


Flags 
OF DF IF TF SF ZF AF PF CF 


Faults | 
PM RM V8086 
12 #SS(O) 
13. #GP(Q) INT 13 #GP(0) 
14 #PF(ec) #PF(ec) 
17. #AC(O) #AC(O) 
Examples 
ADD AL, [4211A] ; 8-bit addition 
ADD AX, 34 ; 16-bit immediate value addition 
ADD ESI, [LEBP+8] ; 32-bit memory addition to register 
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AND 8086/80186/80286/80386/80486 
Boolean AND (8/16p/32) 
Syntax 


AND dest, src 


Operation 


dest — dest & src 
CF ¢ 0 
OF « 0 


Legal Forms 


dest src 
AND reg, idata 
AND mem, idata 
AND reg, reg 
AND reg, mem 
AND mem, reg 
Description 


This instruction performs a bit-by-bit AND operation on the dest and src operands 
and stores the result in the dest operand. The AND operation is defined as follows: 


0&0=0 
O&1=0 
1&0=0 
L1&1=1 


Flags 
OF DF IF TF SF ZF AF PF CF 


Faults 
PM RM V8086 
12 #SS(O) 
13. #GP(O) INT 13 #GP(O) 
14 #PF(ec) #PF(ec) 
17. #AC(O) #AC(O) 
Examples 
AND AL, OFH ; Zero high-order nibble of AL 
AND EBX, ECX ; Compute EBX « EBX & ECX . 
AND BYTE PTR[LEBP+6], 7FH ; Mask off high-order bit of memory operand 
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ARPL 80286/80386/80486 
Adjust RPL Field of Selector | (16) 
Syntax 


ARPL dest, src 


Operation 


if (dest.RPL < src.RPL) then 
dest.RPL <— src.RPL 
ZFe il 

else 
ZF — 0 

endif 


Legal Forms 


dest src 
ARPL reg, reg 
ARPL mem, reg 
Description 


System software uses this instruction to modify a selector’s requested privilege level — 
(RPL) field. Both the dest and src operands must be valid selectors. 


If the RPL of the dest operand is numerically less than the RPL of the src, that is, if 
the dest selector is more privileged, the dest selector’s RPL is changed to match that 
of the src, and the ZF flag is set to 1. If the dest selector is less privileged (numeri- 
cally higher) than the src, the ZF flag is cleared to 0, and the dest operand is not 
modified. 


Operating system routines that are passed selectors from applications should use 
ARPL to ensure that the calling routine has not passed a selector with a higher privi- 
lege than the application is allowed. Use the calling re routine’s CS register as the src 
operand. 


Flags 
OF DF IF TF SF ZF 
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Faults 

PM RM 

6 INT 6 

12 #SS(O) 
13. #GP(O) 
14 #PF(ec) 
17. #AC(0) 
Example 
MOV AX, [LEBP+12] 
ARPL AX, [EBP+2] 
JNZ bad_param 


we we we 


we 
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V8086 
#UDQ 


#AC(O) 


Get parameter off the stack 

Adjust to caller’s RPL (previous CPL) by 
using CS of return address on stack 
Branch if caller passed a bad selector 
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BOUND 80186/80286/80386/80486 
Check Array Boundaries (16p/32) 
Syntax 


BOUND dest, src 


Operation 


if ((dest < src[0]) | (dest > src{1])) then 
INT 5 
endif 


Legal Forms 


dest src 
BOUND reg, mem 
Description 


This instruction compares the dest operand, which must be a register containing a 
signed integer, with two values, a lower bound stored at the address specified by src, 
and an upper bound stored in the following location. The bounds can be 16-bit or 
32-bit values. 


If the dest value is less than the lower bound or greater than the upper bound, an in- 
terrupt 5 occurs. The return address pushed onto the stack by the exception is the 
starting address of the BOUND instruction that caused the interrupt. 


Flags 
OF DF IF TF SE ZF 


Faults 
PM RM V8086 
5  INTS INT 5 INT 5 
6* #4UDQ INT 6 #UD(Q) 
12 ¥#SS(O) 
13 #GP(0) INT 13 #GP(O) 
14 #PF(ec) #PF(ec) 
17. #AC(0) #AC(0) 


*The undefined opcode fault occurs only if the instruction 
encoding of the BOUND instruction specifies an src operand 
that is a register. 
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Example 


VC_LIMITS: 
DD 61, 20 
VC DDB 20 DUP (?) 


MOV EAX, [EBP-6] 
BOUND EAX, VC_LIMITS 


8: Reference Section 


; Bounds for 20-element array 
; Array storage area 


; Get array index 
; Check against limits 
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BSF 80386/80486 
Bit Scan Forward (16p/32) 
Syntax 


BSF dest, src 


Operation 


if (src = 0) then 
ZF el 
dest <— ??? 
else 
ZF — 0 
temp < 0 
while (bit(src, temp) = 0) 
temp <— temp + 1 
dest <— temp 
endif 


Legal Forms 


dest src 
BSF reg, reg 
BSF reg, mem 
Description 


This instruction scans the src operand and writes the bit position of the first 1-bit in 
src to the dest register. If the src operand is 0, the ZF flag is set to 1, and the instruc- 
tion ends with the dest register in an undefined state. 


If the src operand is not 0, each bit is examined, beginning with bit 0, until a 1-bit is 
found. The bit position of the first 1-bit Gindex) is stored in the dest register. 


Flags 
OF DF IF TF SF ZF AF PF CF 


Faults 

PM RM V8086 
12 ¥#SS() 
13 #GP(O) INT 13 #GP() 
14. #PF(ec) #PF(ec) 
17. #AC(O) #AC(0) 


174 


XOR 
Ll: BSF 
JNZ 
INC 
CMP 
JL 
JMP 
GOT_ONE: 


ECX, ECX 

EAX, SECTORS(LECX*4] 
GOT_ONE 

ECX 

ECX, TABLE_SIZE 

L1 

NO_SECTORS 


8: Reference Section 


; Index into sector map 


Scan a dword 

Branch if any bits set 

Go on to next dword 

Done searching? 

No, scan next table entry 


; No bits set in entire table 
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BSR 80386/80486 
Bit Scan Reverse | (16p/32) 
Syntax 


BSR dest, src 


Operation 


if (dest in [AX, BX, CX, DX, SI, DI, BP, SP]) then 
startbit <« 15 | 


else 
startbit < 31 
endif 
if (sre = 0) then 
ZF <¢ 1 
dest < 2??? 
else 
ZF <— 0 


temp < startbit 
while (bit(src, temp) = 0) 
temp < temp - 1 
dest « temp 
endif 


Legal Forms 


dest src 
BSR reg, reg 
BSR reg, mem 
Description 


This instruction scans the src operand in reverse, searching for a 1-bit beginning at 
the high order of the src operand. If the svc operand is 0, the ZF flag is set to 1, and 
the instruction ends with the dest register in an undefined state. 


If the src operand is not 0, each bit is examined, beginning with the high-order bit 
(either 15 for word operands or 31 for doubleword operands), until a 1-bit is found. 
The bit position (index) of the first 1-bit is stored in the dest register. 


Flags 
OF DF IF TF SF ZF AF PF CF 
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Faults 
PM 
12 ¥#SS(O) 
13. #GP(0) 
14. ¥#PF(ec) 
17. #AC(O) 
Example 
MOV 
Li: BSR 
JNZ 
LOOP 
none_found: 


INT 13 


ECX, SEM_MAX-1 


V8086 


#GP(O) 
#PF(ec) 
#AC(O) 


EAX, SEMAPHORE[ECX*4] 


found_it 
L1 


8: Reference Section 


; Index of last entry in 
; semaphore table 

; Scan for non-zero bits 
; Branch if valid index 

; Decrement CX, loop back 
; if not zero 

; Get here 

; if entire table is zero 
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BSWAP 80486 
Byte Swap (32) 
Syntax | 
BSWAP reg 
Operation 


temp < dest 

dest[Q..7] < temp[24. .31] 

dest[8..15] « temp[16..23] 
dest{16..23] <— temp[8..15] 
dest[24..31] < temp[0..7] 


Legal Form 


dest 
BSWAP reg32 


Description 


The order of the four bytes in the 32-bit register operand are swapped. This con- 
verts between “big-endian” and “little-endian” storage formats. This instruction is 
useful when exchanging data between processors with different architectures. 


None. 


Faults 


None. 


Example 


CALL getdata :; Read 32 bits from the network into EAX 
BSWAP EAX ; Convert to local format 

STOSD ; Write to buffer 

LOOP getmore 
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BT 80386/80486 
Bit Test | (16p/32) 
Syntax 


BT dest, index 


Operation 
CF < BIT(dest, index) 


Legal Forms 


dest index 
BT reg, idata 
BT mem, idata 
BT reg, reg 
BT mem, reg 
Description 


This instruction tests the bit specified by the operands and places the value of the 
bit into the carry flag. 


The index operand holds a bit index into the bit string specified by dest, which can 
be a 16-bit or 32-bit register or a memory location. The state of the bit is copied into 
the carry flag. 


If the index operand is an immediate data value, it can range from 0 through 31. If 
the index is held in a register, it can take on any integral value. Some assemblers 
might let you specify immediate index values greater than 31. If so, they modify the 
effective address by an appropriate value so that the index can be scaled back to 
between 0 and 31. | | 


BT does not accept byte operands, so do not use it with memory-mapped I/O de- 
vices because the instruction causes either the 16-bit word or the 32-bit word con- 
taining the selected bit to be read. This could affect more than one I/O device 
register. You should use a single-byte MOV instruction to read the I/O register and 
then test the contents of the register. 


Flags 
OF DF IF TF SF ZF AF PF CF 


CaS ee hake ee 
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Faults 
PM RM 
12 #SS(0) 
13. #GPO) INT 13 
14 #PF(ec) 
17. #AC(O) 
Example 
MOV EAX, 192 
BT SEMAPHORES, EAX 
JC sem_set 
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’ 


V8086 


#GP(0) 
#PF(ec) 
#AC(0) 


; Bit index 


; Test semaphore number 192 


’ 


; Branch if the bit was set 
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BTC/80486 80386/80486 


Bit Test and Complement (16p/32) 


Syntax 
BTC dest, index 


Operation 
CF « BIT(dest, index) 
BIT( dest, index) «— ~BIT(dest, index) 


Legal Forms 


dest index 
BTC reg, idata 
BTC mem, idata 
BTC reg, reg 
BTC mem, reg 
Description 


This instruction copies the bit specified by the operands into CF, then compienens 
the original value of the bit in the dest operand. 


The index operand holds a bit index into the bit string specified by dest, which can 
be a 16-bit or 32-bit register or a memory location. The state of the bit is copied into 
the carry flag, and the bit of the dest operand is complemented. 


If the index operand is an immediate data value, it can range from 0 through 31. If 
the index is held in a register, it can take on any integral value. Some assemblers 
might let you specify immediate index values greater than 31. If so, they modify the 
effective address by an appropriate value so that the index can be scaled back to 
between 0 and 31. : 


BTC does not accept byte operands, so do not use it with memory-mapped I/O de- 
vices because the instruction causes either the 16-bit word or the 32-bit word con- 
taining the selected bit to be read. This could affect more than one I/O device 
register. You should use a single-byte MOV instruction to read the I/O register and 
then test the contents of the register. 


Flags 
OF DF IF TF SEF ZF AF PF CF 
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Faults | 
PM RM V8086 7 
12 #SS(0O) 
13 #GP(0) INT 13 #GP(O) 
14 #PF(ec) © #PF(ec) 
17. #AC(O) #AC(0) 
Example 
MOVZX EAX, BYTE PTR [04A2H] ; Read memory byte into 32-bit register 
BTC EAX, 2 ; Test and complement bit number 2 
MOV [O04A2H], AL ; Write modified byte back to memory 
JC . bitset ; Branch if the bit was set 
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BTR 80386/80486 
Bit Test and Reset (16p/32) 
Syntax 


BTR dest, index 
Operation 

CF < BIT( dest, index) 
BIT(dest, index) < 0 


Legal Forms 


dest index 
BTR reg, idata 
BTR mem, idata 
BTR reg, reg 
BTR mem, reg 
Description 


This instruction copies the bit specified by the operands into CF, then clears the 
original bit in dest to 0. 


The index operand holds a bit index into the bit string specified by dest, which can 
be a 16-bit or 32-bit register or a memory location. The state of the bit is copied into 
the carry flag, and the bit of the dest operand is cleared to 0. 


If the index operand is an immediate data value, it can range from 0 through 31. If 
the index is held in a register, it can be any integer. Some assemblers might let you 
specify immediate index values greater than 31. If so, they modify the effective ad- 
dress by an appropriate value so that the index can be scaled back to between 0 
and 31. 


BTR does not accept byte operands, so do not use it with memory-mapped I/O de- 
vices because the instruction causes either the 16-bit word or the 32-bit word con- 
taining the selected bit to be read. This could affect more than one I/O device 
register. You should use a single-byte MOV instruction to read the I/O register and 
then test the contents of the register. 


When using a BTR instruction to implement a signaling function in a multiprocessor 
environment, the LOCK instruction prefix should immediately precede any BTR in- 
struction that modifies shared memory. : 


Flags 
OF DF IF TF SF ZF AF PF CF 
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Faults 
PM RM V8086 
12 #SS(0) 
13. ~#GP(O) INT 13 #GP(0) 
14 #PF(ec) #PF(ec) 
17. #AC(O) #AC(0) 
Example 
BTR MY_FLAG, 7 ; Zero the high-order bit of byte MY_FLAG 
JNC NOT_SET ; Bit was already reset 
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BTS 80386/80486 
Bit Test and Set (16p/32) 
Syntax 


BTS dest, index 


Operation 


CF ¢ BIT(dest, index) 
BIT(dest, tndex) ¢« 1 


Legal Forms 


dest index 
BTS reg, idata 
BTS mem, idata 
BTS reg, reg 
BTS mem, reg 
Description | 
This instruction copies the specified bit into CF, then sets the original bit in 


dest to 1. 


The index operand holds a bit index into the bit string specified by dest, which can 
be a 16-bit or 32-bit register or a memory location. The state of the bit is copied into 
the carry flag, and the bit of the dest operand is set to 1. 


If the index operand is an immediate data value, it can range from 0 through 31. If 
the index is held in a register, it can be any integer. Some assemblers might let you 
specify immediate index values greater than 31. If so, they modify the effective ad- 
dress by an appropriate value so that the index can be scaled back to between 0 
and 31. 


BTS does not accept byte operands, so do not use it with memory-mapped I/O de- 
vices because the instruction causes either the 16-bit word or the 32-bit word con- 
taining the selected bit to be read. This could affect more than one I/O device 
register. You should use a single-byte MOV instruction to read the I/O register and 
then test the contents of the register. 


When using a BTS instruction to implement a semaphore function in a 
multiprocessor environment, the LOCK instruction prefix should immediately 
precede any BTS instruction that modifies shared memory. 


Flags 
OF DF IF TF SF ZF AF PF CF 
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Faults 
PM 
12 ¥#SS(0) 
13 #GP() 
14 #PF(ec) 
17. #AC(O) 
Example 
BTS MY_FLAG, 
JC WAS_SET 
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INT 13 


7 


V8086 


#GP(0) 
#PF(ec) 
#AC(O) 


; Set the high-order bit of byte MY_FLAG 
; Bit was already set 
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CALL 8086/80186/80286/80386/80486 
Far Procedure Call (32p/48) 


Syntax 
CALL dest 


Operation 


push(CS) 
push(EIP) 
CS:EIP « dest 


Legal Forms 


dest 
CALL idata ; CS:EIP <— idata 
CALL mem ; CS:EIP + [mem] 
Description 


The far procedure call saves the current code segment selector and the address of 
the next instruction (EIP) on the stack. Control then transfers to the destination 
specified by the operand. The operand can be an immediate selector:offset value or 
the address of a 48-bit FAR pointer in memory. 


The selector can point to another code segment, a call gate, a task gate, or a task 
state segment. If the selector points to a gate or TSS, the offset portion of the CALL 
is ignored. If the selector points to a code segment, control transfers to the specified 
offset within that segment. 


All flags are affected by a task switch. 


Flags 
OF DF IF TE SF ZF 


Faults 
PM RM V8086 
10 #TS(O) | 
10 = #TS(sel) #TS(sel) 
11 #NP(el) #NP(sel) 
12 #SS(0) : 
12  #SS(SS) | 
13. #GPO) INT 13 #GP(O) 
#GP(CS) INT 13 #GP(0) 
14 #PF(ec) #PF(EC) 
17 #AC(O) #AC(O) 
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Examples 
CALL 16A3:0000 ; Direct call 
CALL FWORD PTR [005AH] ; Indirect call 
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CALL 8086/80186/80286/80386/80486 
Near Procedure Call (16p/32) 


Syntax 
CALL dest 


Operation 


push(EIP) 
EIP — dest 


Legal Forms 


dest 
CALL offset ; EIP — EIP + offset 
CALL mem ; EIP — [mem] 
CALL reg ; EIP « [reg] 
Description 


This instruction pushes the address of the next instruction (EIP) onto the stack. The 
instruction pointer is then set to the value specified by the operand. 


If the operand is an immediate value, the new instruction pointer is relative to the 
current position. If the operand is a memory address or a register, the subroutine 
address is taken indirectly from the operand. 


Flags 
OF DF IF TE SF ZF 


Faults 
PM RM V8086 
12 #SS(O) | 
13. #GP(O) INT 13 #GP(0) 
14. #PF(ec) #PF(ec) 
17. #AC(O) #AC(0) 
Examples 
CALL SQRT > Call direct 
LEA EBX, FN_TABLE ; Get pointer to address table 
MOV EAX, 3 ; Select third function 
CALL [ EBX+EAX*4 ] ; Call it 
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CBW | 8086/80186/80286/80386/80486 
Convert Byte to Word (8) 


Syntax 
CBW 


Operation 


if BIT(AL, 7) then 
AH < OFFH 
else 
AH <— 0 
endif 


Legal Form 
CBW 


Description 
This instruction sign-extends the byte in AL to AX. 


Flags 
OF DF IF TF SF ZF 


Faults 

None. 

Example 

MOV AL, TINY ; Read a byte into AL 

CBW ; Convert to 16-bit signed integer 
ADD BX, AX 
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CDQ 80386/80486 
Convert Doubleword to Quadword (32) 
Syntax 
CDQ 
Operation 


if (BIT(EAX, 31) = 1) then 
EDX « OFFFFFFFFH 

else 
EDX <« 0 

endif 


Legal Form 
CDQ 


Description 


This instruction sign-extends the 32-bit EAX register to a 64-bit dword. It is most 
frequently used before the integer divide instruction, which operates on a 64-bit 
dividend. 


Flags 
OF DF IF TF SF ZF 


Faults 

None. 

Example 

MOV EAX, [400H] ; Copy dividend to EAX 
CDQ ; Extend to 64 bits 
IDIV DWORD PTR [20H] ; Divide 
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CLC 8086/80186/80286/80386/80486 
Clear Carry Flag ©) 


Syntax 
CLC 


Operation 
CF « 0 


Legal Form 
ELC 


Description 
This instruction clears the carry flag in the EFLAGS register to 0. 


Flags 
OF DF IF TF SF ZF 


Faults 
None. 
Example 
NO_ERROR: 
CLC ; Clear carry 
RET ; Return from subroutine with success 


; indicated by CF 
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CLD 8086/80186/80286/80386/80486 
Clear Direction Flag () 


‘Syntax 
CLD 


Operation 
DF — 0 


Legal Form 
CLD 


Description 


This instruction clears the direction flag in the EFLAGS register to 0. When DF is 0, 
any string instructions increment the index registers (ESI or EDI). 


Flags 
OF DF IF TF SF ZF 


Faults 

None. 

Example 

MOV ECX, STR_LEN ; String move count 
CLD ; Clear direction flag 
REP MOVSB ; Copy the string 
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CLI 8086/80186/80286/80386/80486 
Clear Interrupt Flag () 


Syntax 
CLI 


Operation 
IF <« 0 


Legal Form 
CLI 


Description 


This instruction clears the interrupt bit in ihe EFLAGS register to 0, disabling hard- 
ware interrupts (except NMI). The procedure executing the CLI instruction must be 
of equal or higher privilege than the current IOPL, that is, CPL < IOPL, or a general 
protection fault occurs. 


Flags 
OF DF IF TF SF ZF 


Faults 
PM RM V8086 

13. +#GP(O) | #GP(O) 
Example 

CLI ; Disable interrupts 
MOV AL, SEMAPHORE ; Get memory value 
DEC AL ; Decrement counter 
JZ done ; Skip if value was 0 
MOV SEMAPHORE, AL ; Update 

DONE: . | 

STI ; Enable interrupt 
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CLTS 80286/80386/80486 
Clear Task Switched Bit () 
Syntax 
CLTS 
Operation 


BIT(CRO, 3) « 0 


Legal Form 
CLTS 


Description 

This instruction clears the task switched (TS) bit in the CRO register to 0. The TS bit 
allows the 80386 to efficiently manage the floating-point unit, Whenever a task 
switch occurs, the CPU sets the TS bit to 1. If the TS bit is 1 when a coprocessor 
escape (ESC) executes, a coprocessor not available fault (int 7) occurs. A WAIT in- 
struction will also trigger INT 7 if both the TS and MP bits on CRO are 1. 


The fault handler can clear the TS bit, save the NDP state, load the NDP state for the 
current task, and return to the instruction that faulted. Switching between tasks that 
do not use floating point will not cause the fault, and you avoid the overhead of sav- 
ing and restoring the NDP state. 


Only procedures running at a CPL of 0 can execute CLTS without causing a general 
protection fault. 


CLTS is valid in real mode to allow initialization for protected mode. 


Flags 
OF DF IF TF SF ZF 


Faults 
PM RM V8086 
13 #GP(O) #GP(0) 
Example 
CLTS ; Clear task switched bit 
CALL SWAP_NDP_STATE ; Save/restore math coprocessor state 
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CMC 8086/80186/80286/80386/80486 
Complement the Carry Flag () 


Syntax 
CMC 


Operation 
CF < ~CF 


Legal Form 
CMC 


Description 


The carry bit of the EFLAGS register is complemented; that is, if the initial value of 
the carry bit is 0, it is set to 1. If the initial value is 1, the flag is cleared to 0 asa 
result of the instruction. 


Flags 
OF DF IF TF SF ZF AF PF CF 


Faults 
None. 
Example 
BT EAX, 1 ; Test a bit, save in CF 
JC EXIT ; Bit was set--we’re done 
JMP TRY_AGAIN ; Not ready yet 
EXIT: 
CMC ; Return, CF clear 
RET 
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CMP 8086/80186/80286/80386/80486 
Compare Integers (8/16p/32) 
Syntax 
CMP opl, op2 
Operation 


NULL ¢ opl - op2 


Legal Forms 
op1 op2 


CMP reg, idata 
CMP mem, idata 
CMP reg, reg 
CMP reg, mem 
CMP mem, reg 
Description 


This instruction subtracts the contents of op2 from op1 and discards the result. Only 
the EFLAGS register is affected. The following table illustrates how the flags are set 
based on the operand values. 


Condition Signed Compare Unsigned Compare 
op1 > op2 ZF = 0 and SF = OF CF = 0 and ZF = 0 
op1 2 op2 SF = OF CF =0 

op1 = op2 ZF=1 ZF = 1 

op1 < op2 ZF = 1 and SF != OF CF = 1 or ZF=1 

op1 < op2 SF != OF CF=1 


If op7 is a 16-bit or 32-bit operand and op2 is an 8-bit immediate value, op2 is sign- 
extended to match the size of op1. 


Flags 
OF DF IF TF SF ZF AF PF CF 


Faults 

PM RM V8086 
12 #SS(O) 
13. #GP(O) INT 13 #GP(0) 
14 #PF(ec) #PF(ec) 
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Examples 

CMP AL, [4211A] 

CMP AX, [BX+3] 

CMP CX, [EBP+8][EAX*2] 
CMP ESI, 7 


198 


9 
o 
e 
? 


; 8-bit compare 
; 16-bit real/virtual mode 


16-bit protected mode 


; 32-bit compare with sign-extended 


op2 operand 


8: Reference Section 


CMPS 8086/80186/80286/80386/80486 
Compare String (8/16p/32) 
Syntax 
CMPS 
Operation 


when opcode is (CMPSB, CMPSW, CMPSD) set opsize « (1, 2, 4) 
NULL <— DS:[ESI] - ES:(EDI] 
if (DF = 0) then 
ESI <— ESI + opsize 
EDI « EDI + opsize 
else 
ESI-opsize 
EDI-opsize 
endif 


Legal Forms 


CMPSB ; Compare string byte 

CMPSW ; Compare string word 

CMPSD ; Compare string doubleword 
Description 


This instruction subtracts the memory operand pointed to by DS:ESI from the 
operand at ES:EDI and discards the result, as in the CMP instruction. The size of 
the operand is either a byte, word, or doubleword, depending on the opcode used. 
The flags are set as the comparison dictates, and the contents of ESI and EDI are 
modified, either incremented by the size of the operand, or decremented, depend- 
ing on the setting of the DF bit in the EFLAGS register. ESI and EDI are incremented 
when DF = 0. 


You can precede the CMPS instruction with either the REPE or REPNE prefix to re- 
peatedly compare operands while the ZF bit remains 1 CREPE) or 0 (REPNE). Regis- 
ter ECX holds the maximum compare count. 


You can also apply a segment override prefix to the CMPS instruction to override 
the DS segment of the DS:[ESI] operand. You cannot override the ES segment 
assumption for the EDI operand. 


Flags 
OF DF IF TF SF ZF AF PF CF 
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Faults 


12 
13 
14 
17 


PM 


#SS(0) 
#GP(0) 
#PF(ec) 
#AC(0) 


Example 


LEA 
LES 
MOV 
CLD 


ESI, 
EDI, 
ECX, 


REPE CMPSB 
not_eq 


JNE 


200 


INT 13 


standard 
[EBP+12] 
31 


V8086 


#GP(0) 
#PF(ec) 
#AC(0) 


; DS:ESI points to default 

; ES:EDI loaded from stack frame 

; Count is a constant 

; Ensure direction flag set correctly 
; Compare byte string 

; Branch if strings not equal 


CMPXCHG 


Compare and Exchange 


Syntax 
CMPXCHG dest, src 


Operation 


if acc = dest then 
ZF ¢ 1 
dest < src 
else . 
ZF ¢— 0 
acc — dest 


Legal Forms 


dest 
CMPXCHG reg, 
CMPXCHG mem, 
Description 


Src 


reg 
reg 


8: Reference Section 


80486 
(8/16p/32) 


The value of dest is read and compared with the accumulator (AL, AX, or EAX). If 
the values are equal, the value of src is written to location dest, otherwise, the ac- 
cumulator value is replaced by dest. The flags are set as if a CMP acc,dest instruc- 


tion had been executed. 


When preceded by the LOCK prefix, this instruction is very useful for 


multiprocessor semaphore operations. 


Notice that this instruction always generates both a read and a write cycle. If the 
compare succeeds, src is written to location dest; otherwise, the original value of 


dest is written back. 


Flags 
OF DF IF TF SF ZF 
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Faults 
PM 


12 #SS() 
13 #GP(0) 
14 ¥#PF(ec) 
17. #AC(O) 


Example 


XOR AL,CL 
MOV BL,1 
CMPXCHG = sema,BL 
JNE failed 
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RM V86 

INT 13 #GP(0) 
#PF(ec) 
#AC(O) 


; AL <— 0, semaphore available value 


Semaphore hold value 


; Compare 
; Semaphore already held 


8: Reference Section 


CWD 8086/80186/80286/80386/80486 
Convert Word to Doubleword (16) 


Syntax 
CWD 


Operation 


if (BIT(AX, 15 = 1)) then 
DX < OFFFFH 

else 
DX — 0 

endif 


Legal Form 
CWwD 


Description 


This instruction sign-extends the word in AX to the DX:AX register pair. The 
preferred 16-bit to 32-bit conversion instruction is CWDE. CWD is used by the 8086 
and 80286, which do not have 32-bit registers. 


Flags 
OF DF IF TF SF ZF 


Faults 

None. 

Example 

MOV AX, divisor _ : Get 16-bit divisor 
CWD > Extend to DX:AX 
DIV CX - 16-bit division 
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CWDE 80386/80486 
Convert Word to Doubleword Extended (16) 
Syntax 
CWDE 
Operation 


if (BITCEAX, 15) = 1) then 
EAX < EAX | FFFFOOOOH 
else 
EAX «— EAX & OOOOFFFFH 
endif 


Legal Form 
CWDE 


Description 


This instruction sign-extends the 16-bit value in AX to a full 32 bits in the EAX 
register. 


Flags 
OF DF IF TF SF ZF 


Faults 
None. 

- Example 
MOV AX, short_int ; Get 16-bit signed value 
NEG AX, : Convert to negative number 
CWDE ; Return 32-bit result 
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DAA 8086/80186/80286/80386/80486 
Decimal Adjust AL After Addition (8) 


Syntax 
DAA 


Operation 


if (AF | (AL & OFH) > 9) then 
AL — AL + 6 
AF el 

else 
AF <— 0 

endif 

if (CF | (AL > 9FH)) then 
AL — AL + 60H 
CF ¢ l 

else 
CF ¢« 0 

endif 


Legal Form 
DAA 


Description 


This instruction ensures that AL contains a valid decimal result after an addition of 
two packed BCD values. 


Flags 
OF DF IF TF SF ZF AF. PF CF 


Faults 

None. 

Example 

MOV AL, 72H ; 72 in packed decimal 
ADD AL, 19H ; Yields 8BH in AL 
DAA ; Adjusts AL to 91H 
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DAS 8086/80186/80286/80386/80486 
Decimal Adjust AL After Subtraction (8) 


Syntax 
DAS 


Operation 


if (AF | (CAL & OFH)) > 9) then 
AL <— AL - 6 
AF — 1 

else 
AF ¢« 0 

endif 

if (CF | (AL > 9FH)) then 
AL <— AL - 60H 
CF ¢ 1 

else 
CF ¢ 0 

endif 


Legal Form 
DAS 


Description 


This instruction ensures that AL contains a valid decimal result after a subtraction of 
two packed BCD values. 


Flags 
OF DF IF TF SF ZF AF PF CF 


Faults 

None. 

Example 

MOV AL, 42H ; 42 in packed decimal 
SUB AL, 13H ; Yields 2FH in AL 
DAS ; Adjusts AL to 29H 
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DEC 8086/80186/80286/80386/80486 
Decrement (8/16p/32) 
Syntax 
DEC opl 
Operation 


opl < opl - 1 


Legal Forms 
op! 


DEC reg 
DEC mem 
Description 


This instruction subtracts the value 1 from op7. DEC is frequently used to decrement 
indexes and therefore does not affect the carry flag (CF). In other respects, it is 
equivalent to the instruction: 


SUB opl, 1 
Flags 
OF DF IF TF SF ZF AF PF CF 


Faults 
PM RM V8086 
12 #SS() #SS(O) 
13 #GP(O) INT 13 #GP(0) 
14 #PF(ec) #PF(ec) 
17. #AC(O) #AC(O) 
Example 
DEC ESI ; Decrement contents of ESI 
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DIV 8086/80186/80286/80386/80486 
Unsigned Division (8/16p/32) 
Syntax 
DIV opl 
Operation 


low(acc) <« acc / opl 
high(acc) « acc modulo opl 


Legal Forms 
opl 


DIV reg 
DIV mem 
Description 


This instruction divides the value in the accumulator register or register pair by op/, 
storing the quotient in the low-order portion of the accumulator and the remainder 
in the high-order portion. The following table illustrates the registers used as ac- 
cumulators, depending on the size of op1. 


Size of op1 Dividend Quotient Remainder 
Byte AX AL AH 

Word DXx,AX AX DX 

Dword EDX,EAX EFAX EDX 


If the dividend is 0 or if the quotient is too large to fit in the result accumulator, a di- 
vide error fault Ginterrupt 0) occurs. 


Flags 
OF DF IF TF SF ZF AF PF CF 


Faults 
PM RM V8086 
0 INTO INT 0 INT 0 
12 #SS() 
13. #GP(0) INT 13 #GP(0) 
14 #PF(ec) #PF(ec) 
17. #AC(O) #AC(0) 
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Example 

MOV EAX, dividend 

CWDE ; Convert 32-bit operand to 64 bits 
DIV EBX ; 32-bit divide 

MOV quotient, EAX ; Save result 

MOV remainder, EDX 
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ENTER 80186/80286/80386/80486 
Enter New Stack Frame | () 
Syntax 


ENTER Jocals, nesting 


Operation 


nesting — max (nesting, 31) 
push (EBP) 
temp < ESP 
if (nesting > 0) then 
nesting — nesting - 1 
while (nesting > 0) 
EBP «<— EBP - 4 
push (SS:[EBP]) 
nesting <— nesting - 1 
endwhile 
push (temp) 
endif 
EBP « temp 
ESP «— ESP - locals 


Legal Forms 


locals nesting 
ENTER idata, idata 


Description 


This instruction sets up the stack frame used by high-level languages. The form 
ENTER n,0 is equivalent to the instructions: 


PUSH EBP 
MOV EBP, ESP 
SUB ESP, n 


This saves the previous frame pointer (EBP), sets the frame to the current stack top 

(ESP), and allocates space for local variables. Parameters passed to the procedure 
are addressed as positive offsets from EBP, and local variables are addressed as 
negative offsets from EBP. 


When the second operand is greater than 0 (which happens only in languages that 
allow nesting of procedure definitions), the pointers to previous stack frames are 
pushed onto the stack to allow addressing of stack-resident variables whose scopes 
are Outside the current stack frame. 
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Languages such as FORTRAN and C do not allow lexical procedure nesting, so they 
always use ENTER with a nesting operand of 0. Pascal, Modula-IJ, and Ada allow 
procedure nesting, and compilers for those languages generate the more complex 
form of ENTER. 


Flags 
OF DF IF TF SF ZF 


Faults 
PM RM V8086 
12 #SS(O) 
14. #PF(ec) #PF(ec) 
Example 
ENTER 4, 0 ; Create stack frame with 


; space for a dword local 
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HLT 8086/80186/80286/80386/80486 
Halt | QO 


Syntax 
HLT 


Legal Form 
HLT 


Description 


This instruction stops all further processing. No other instructions will execute until 
the processor is reset or an interrupt occurs. An NMI interrupt always brings the 
processor out of the halt state. The IF flag must be 1 for any other hardware inter- 
rupt to be acknowledged. After processing the interrupt, execution continues with 
the instruction immediately following HLT. 


You must execute at a CPL of 0 to issue a HLT instruction; otherwise, a general pro- 


tection fault occurs. 


Flags 
OF DF IF TF SF ZF 


Faults 
PM RM V8086 

13. #GP(O) #GP(0) 

Example 
STI 

Ll: HLT ; Idle, processing only interrupts 
JMP L1 
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IDIV 8086/80186/80286/80386/80486 
Integer (Signed) Division (8/16p/32) 
Syntax 
IDIV opl 
Operation 


low(acc) « acc / opl 
high(acc) < acc modulo opl 


Legal Forms 
op! 


IDIV reg 
IDIV mem 
Description 


This instruction divides the value in the accumulator register or register pair by op/, 
storing the quotient in the low-order portion of the accumulator and the remainder 
in the high-order portion. The following table illustrates the registers used as ac- - 
cumulators, depending on the size of op1. 


Size of op1 Dividend Quotient Remainder 
Byte AX | AL AH 

Word DX,AX AX DX 

Dword EDX,EAX EAX EDX 


If the dividend is 0 or if the quotient is too large to fit in the result accumulator, a di- 
vide error fault Gnterrupt 0) occurs. 


Flags | 
OF DF IF TF SE ZF AF PF CF 


Faults 
PM RM V8086 
0 INTO | INT 0 INT 0 
12 #SS() 
13 #GP() INT 13 #GP(0) 
14 #PF(ec) #PF(ec) 
17. #AC() #AC(O) 
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Example 

MOV EAX, [ESP+14] ; Get dividend 

CDQ ; Convert to 64 bits 
-IDIV ECX 
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IMUL 8086/80186/80286/80386/80486 
Integer (Signed) Multiplication (8/16p/32) 
Syntax 


IMUL op1, Lop2, [op3]] 


Operation 


dest <— multiplier * multiplicand 


Legal Forms 
opl op2 op3 


IMUL reg ; acc < acc * reg 
IMUL mem ; acc ¢ acc * mem 
IMUL reg, reg ; opl « opl * op2 
IMUL reg, mem ; opl <— opl * op2 
IMUL reg, idata ; opl — opl * op2 
IMUL reg, reg, idata ; opl <« op2 * op3 
IMUL reg, mem, idata ; opl ¢« op2 * op3 


Description 


This instruction multiplies signed, two’s complement integers. The flags are left in 
an unknown state except for OF and CF, which are cleared to 0 if the result of the 
multiplication is the same size (byte, word, or dword) as the multiplicand. 


In the single operand form of the instruction, the result is placed in AX if op/ is a 
byte, DX:AX if op7 is a word, and EDX:EAX if op1 is a dword. 


In the forms of IMUL that use 2 or 3 operands, the operands must all be the same 
SiZe. 


Flags 
OF DF IF TF SF ZF AF PF CF 


Faults 
PM RM V8086 
12 #SS() 
13 #GP(O) INT 13 #GP(0) 
14 #PF(ec) #PF(ec) 
17. #AC(O) #AC(0) 
Examples 
IMUL ECX > EDX:EAX «— EAX * ECX 
IMUL Alig “CHy 7 > AL = CH * 7 
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IN 8086/80186/80286/80386/80486 
Input from I/O Port (8/16p/32) 


Syntax 


IN acc, port 


Operation 
ACC < (port) 


Legal Forms 


acc ort 
IN acc, idata 
IN acc, DX 
Description 


This instruction reads a byte, word, or dword into the specified accumulator from 
the designated I/O port. If you use an immediate data value in the instruction, 
you can address only the first 256 ports. If the port is specified in the DX register, 
you can access any of the 65536 ports. 


IN is a privileged instruction. A procedure that attempts to execute an input instruc- 
tion must satisfy one of two conditions to avoid a general protection fault. 


If the procedure that executes an IN instruction has I/O privilege (that is, if its CPL 
is numerically less than or equal to the IOPL field in the EFLAGS register), the input 
instruction executes immediately. 


If the procedure does not have I/O privilege, the I/O permission bitmap for the cur- 
rent task is checked. If the bit(s) corresponding to the I/O port(s) is cleared to 0, the 
input instruction executes. If the bit(s) is set to 1, or the port(s) is outside the range 
of the bitmap, a general protection fault occurs. See Chapter 5 for more details on 
this feature. 


If the IN instruction is encountered while in V86 mode, only the I/O permission bit- 


map is tested. The IOPL value is not a factor in validating access to the port. 


Flags 
OF DF IF TE SE ZF AF PE CF 


Faults 
PM RM V8086 
13. ~#GP(O) #GP(0) 
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Examples 
IN AX, 72H ; Input a 16-bit value 
; from ports 72H and 73H 
MOV DX, crt_port 
IN AL, DX ; Input a byte value 
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INC 8086/80186/80286/80386/80486 
Increment | (8/16p/32) 
Syntax 
INC opl 
Operation 


opl <— opl1 +1 


Legal Forms 
opl . 


INC reg 
INC mem 
Description 


This instruction adds the value 1 to op/. This instruction is often used to increment 
indexes and therefore does not affect the carry flag (CF). In other respects, it is 
equivalent to the instruction: 


ADD opl, 1 
Flags 
OF DF IF TF SF ZF AF PF CF 


Faults 

PM RM V8086 
12 ¥#SS() 
13 #GP(O) INT 13 #GP(0) 
14 #PF(ec) #PF(ec) 
17. #AC(O) #AC(O) 
Example 


INC ESI ; Increment contents of ESI 
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INS 80186/80286/80386/80486 
Input String from I/O Port (8/16p/32) 
Syntax 
INS 
Operation 


when opcode is (INSB, INSW, INSD), set opsize « (1, 2, 4) 
ES:CEDI] <— port(DX) 
if (DF = 0) then 
EDI « EDI + opsize 
else 
EDI « EDI - opsize 
endif 


Legal Forms 


INSB ; Input string byte 

INSW ; Input string word 

INSD ; Input string doubleword 
Description 


This instruction allows the location specified by ES:[EDI] to receive data input from 
the I/O port contained in the DX register. An 8-bit operation (INSB) adjusts the ad- 
dress in EDI by 1, a 16-bit operation NSW) adjusts EDI by 2, and a 32-bit operation 
(INSD) adjusts EDI by 4. The memory offset in EDI is incremented if the DF bit is 0 
or is decremented if DF is 1. | 


Like the IN instruction, the INS instruction is privileged. The executing procedure 
must have a CPL equal to or numerically less than the IOPL, or access to the port 
specified in DX must be granted by the I/O permission bitmap in the TSS. 


You can use the REP prefix with the INS instruction. Using the prefix causes register 
ECX to be interpreted as an instruction count. 


A segment override prefix does not affect the INS instruction. The destination seg- 
ment is always ES. 


Flags 
OF DF IF TF SF ZF 
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Faults 
PM RM V8086 — 
13. #GP(O) INT 13 #GP(0) 
14 #PF(ec) #PF(ec) 
17. #AC(O) #AC(O) 
Examples 
LEA EDI, new_val ; Set up destination pointer 
MOV DX, 370H ; Set up port address 
CLD 
INSD | ; Input 32-bit value to new_val 
INSD ; Input value to new_val + 4 
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INT 8086/80186/80286/80386/80486 
Software Interrupt C) 


Syntax 


INT vector 


Operation 


push(EFLAGS) 

push(CS) 

push(EIP) 

TF ¢ 0 

if (IDT(vector).TYPE = INTERRUPT_GATE) then 
IF ¢ 0 

endif 

CS:EIP « destination(IDT( vector) ) 


Legal Form 


vector 
INT idata 
Description 


This instruction saves the current flags and execution location on the stack, and the 
vector operand indicates the IDT entry that is selected. The gate from the IDT de- 
termines the new execution location. 


If the processor encounters the INT instruction while in V86 mode, the 80386 
switches to the ring 0 stack (SSO:ESPO) taken from the V86 task state segment before 
processing the interrupt. Because the processor is running in ring 0, the IDT entry 
must have a DPL of 0; otherwise, a general protection fault occurs. 


The INT 3 instruction is usually encoded as a single byte (OCCH) and used as a 
breakpoint instruction for debuggers. 


Flags 
OF DF IF TF SF ZF 
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Faults 
PM RM V8086 


10 #TS(sel) 
11. #NP(sel) 


12 #SS(0) 

13. +#GP(O) INT 13 #GP(O) 

14 #PF(ec) #PF(ec) 

Example 

INT 42 ; Make a system-dependent OS call 
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INTO 8086/80186/80286/80386/80486 
Interrupt on Overflow () 


Syntax 
INTO 


Operation 


if (OF) then 
INT 4 
endif 


Legal Form 
INTO 


Description 


This instruction executes an INT 4 instruction if the overflow bit (OF) in the 
EFLAGS register is 1. See the INT instruction for further details. 


Flags 
OF DF IF TF SF ZF 


Faults 
PM RM V8086 


10 #TS(el) 
11 #NP(sel) 


12 #SS(O) 

13 #GP() INT 13 #GP(0) 

14 #PF(ec) #PF(ec) 

Example 

ADD ECX, VECTORLEDI*4] ; Arithmetic operation 


INTO ; Check for overflow 
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INVD 80486 
Invalidate Cache () 
Syntax 
INVD 
Operation 


The internal cache is invalidated. 


Legal Form 
INVD 


Description 


The internal cache is invalidated. A special hardware bus cycle is also initiated, 
which can be used to invalidate external cache hardware. 


Flags 
OF DF IF TF SF ZF AF PF CF 


Faults 

None. 

Example 

INVD ; Invalidate old cache 
MOV EAX,CRO ; Get CRO 

AND EAX,060000000h_ ; Enable cache 

MOV CRO,EAX ; Rewrite CRO 
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INVLPG 80486 
Invalidate TLB Entry (32) 
Syntax 
INVLPG mem 
Operation 


if PTE(mem) is in TLB(i) then 
invalidate TLB(i) 


Legal Form 
INVLPG mem 


Description 


If the page table entry for the page containing address mem is in the TLB, then that 
TLB entry is invalidated. 


Flags 
OF DF IF TE SF ZF 


Fault 
PM RM V8086 
6 8 #UDO INT 6 


*The undefined opcode fault occurs only when the operand is 
encoded as a register. 


Example 
INVLPG [ESI+4] ; Invalidate PTE for this address 
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IRET 8086/80186/80286/80386/80486 
Interrupt Return () 
Syntax 
IRET 
Operation 


if (NT = 1) then 

task_return (TSS.back_link) 
else 

pop (EIP) 

pop (CS) 

pop (EFLAGS) 
endif 


Legal Form 
IRET 


Description 
This instruction signals a return from an interrupt or, if the NT (nested task) bit is 
set to 1, a task switch from the current task to the one that invoked it. 


When the new value of EFLAGS is popped from the stack, the IOPL bits are modi- 
fied only if the CPL is 0. 


Chapter 5 discusses transitions across protection rings and task switching. 


If the IRET instruction executes while the processor is in V86 mode, a general pro- 
tection fault occurs. It is the responsibility of the fault handler to emulate the real- 
mode IRET for the V86 task. 


Flags 
OF DF IF TF SF ZF AF PF CF 


Faults 

PM RM V8086 
11 
12 #S8S(0O) 
13 #GP(O) INT 13 #GP(O) 
14. #PF(ec) #PF(ec) 
Example 
IRET 
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Jcc 8086/80186/80286/80386/80486 
Jump if Condition () 
Syntax 
Jcc offset 
Operation 


if (cc) then 
EIP — EIP + sign_extend(offset) 
endif 


Legal Forms 


JA offset ; Jump above (unsigned x > y) / CF =0 & ZF=0 
JAE offset ; Jump above or equal / CF = 0 

JB offset ; Jump below (unsigned x < y) / CF = 1 

JBE offset ; Jump below or equal / CF = 1 | ZF=1 

JC offset ; Jump if carry / CF =1 


JCXZ offset ; Jump if CX = 0 
JECXZ offset ; Jump if ECX = 0 


JE offset ; Jump equal / ZF = 1 | 
JG offset ; Jump greater (signed x > y) / SF = OF & ZF = 0 
JGE offset ; Jump greater or equal / SF = OF 

JL offset ; Jump less (signed x < y) / SF != OF & ZF = 0 
JLE offset ; Jump less or equal / SF != OF 

JNA offset ; Jump not above (JBE) 

JNAE offset ; Jump not above or equal (JB) 

JNB offset ; Jump not below (JAE) 

JNBE offset ; Jump not below or equal (JA) 

JNC offset ; Jump no carry / CF = 0 

JUNE offset ; Jump not equal / ZF = 0 

JNG offset ; Jump not greater / SF != OF & ZF =1 
JNGE offset ; Jump not greater or equal (JL) 

JNL offset ; Jump not less (JGE) 

JNLE offset ; Jump not less or equal (JG) 

JNO offset ; Jump no overflow / OF = 0 

JNP offset ; Jump no parity / PF = 0 

JNS offset ; Jump no sign / SF = 0 

JNZ offset ; Jump not 0 / ZF = 0 

JO offset ; Jump if overflow / OF = 1 

JP offset ; Jump if parity / PF = 1 

JPE offset ; Jump parity even / PF = 1 

JPO offset ; Jump parity odd / PF = 0 

JS offset ; Jump if sign / SF=1 

JZ offset ; Jump if 0 / ZF = 1 
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Description 


The Jcc instructions test the conditions described for each mnemonic. If the condi- 
tion holds true, the processor branches to the specified location. If the condition is 
false, execution continues with the instruction following the jump. 


More than one mnemonic exists for the same condition. This lets you write the test 
in a manner most appropriate for the condition. For example, after OR EAX, EAX 
you would use JZ, and after CMP EAX,ESI you would use JE; both mnemonics test 
for ZF = 1. 


Flags 
OF DF IF TF SF ZF 


Fault 
PM RM V8086 
13. #GP(O) 
Example 
DEC AL ; Decrement AL 
JZ reached_zero ; Branch if zero 
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JMP 8086/80186/80286/80386/80486 
Near Jump () 


Syntax 
JMP dest 


Operation 
EIP « dest 


Legal Forms 


dest 
JMP offset ; EIP « EIP + offset 
JMP reg ; EIP < reg 
JMP mem ; EIP — [mem] 
Description 


This instruction loads a new value into the instruction pointer (EIP). Subsequent in- 
structions are fetched beginning at the new location. 


When you use the immediate form of the instruction, the data value is an offset 
from the current EIP. The other forms are indirect branches, that is, the new value 
of EIP is taken from the operand register or memory location. 


Flags 
OF DF IF TE SE ZF AF PE CF 


Faults 
PM RM V8086 
12 #SS(O) 
13 #GPCO) INT 13 #GP(O) 
14. #PF(ec) #PF(ec) 
17. #AC(O) #AC(O) 
Examples 
JMP new_label + Direct, relative branch 
JMP ECX ; Branch indirect 
JMP DWORD PTR [EBP+12] ;: Branch to routine whose 


> address is on stack 
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JMP 8086/80186/80286/80386/80486 
Far Jump C) 
Syntax 
JMP dest — 
Operation 


CS:EIP + dest 


Legal Forms 


dest 
JMP idata ; CS:EIP < data 
JMP mem ; CS:EIP + [mem] 
Description 


A far jump instruction modifies both CS and EIP. In the immediate form of the in- 
struction, a new 48-bit pointer is specified. In the indirect form, the mem operand 
points to a 48-bit selector:offset pointer. 


The new CS selector can be a code segment selector (where the branch is to the 
specified offset within the code segment), or the selector can be a call gate, task 
gate, or task state segment. In this case, the offset portion of the JMP is ignored, and 
the new value of EIP is taken from the gate or the incoming TSS. If the jump causes 
a task switch, all flags are subject to change as EFLAGS reloads from the new task’s 
TSS. Chapter 5 discusses the task switch operation and the use of gates. 


Flags 
OF DF IF TF SF ZF 


Faults 
PM | RM V8086 


10 #TS(sel) 
11 #NP(sel) 


12 ¥#SSCO) 

13. #GP(O) INT 13 #GP(0) 

14 #PF(ec) #PF(ec) 

17. #AC(Q) #AC(O) 
Examples 

JMP 21A7:000211F3H ; Direct branch 
JMP FWORD PTR new_task ; Branch indirect 
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LAHF 8086/80186/80286/80386/80486 


Load AH with Flags (8) 
Syntax 
LAHF 
Operation 


AH <— EFLAGS & OFFH 


Legal Form 
LAHF 


Description 


This instruction copies the low-order byte of the EFLAGS register into AH. After the 
instruction executes, the AH register has the following contents: 


iL 0 


Flags 
OF DF IF TF SE ZF 


Faults 

None. 

Example 

LAHF 

SHR AH, 6 

AND AH, l ; AH now contains the ZF flag 
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LAR 80286/80386/80486 
Load Access Rights (16p/32) 
Syntax 


LAR dest, select 


Operation 


if (check_access(select)) then 
ZFel 
dest <— access_rights(descriptor(select)) & OOF?FFOOH 


Legal Forms 


dest select 
LAR reg, reg 
LAR reg, mem 
Description 


This instruction allows a program to determine whether a given selector is acces- 
sible to it without causing a protection fault. 


If the select operand contains a valid 80386 selector that is accessible to the execut- 
ing procedure and the selector type is one defined below, the zero flag (ZF) is set to 
1, and the access rights field of the descriptor indicated by the selector is loaded into 
the destination register. | 


If the destination register is a 16-bit register, the high-order 8 bits of the register 
contain the access rights field of the descriptor. 
15 8 7 0 


A OBL S| TPE 


If the destination is a 32-bit register, bits 8-15 contain the access rights, and bits 20— 
23 contain the access extension bits found in byte 6 of the descriptor. 


31 16 15 8 7 ©) 


23 20 
_ afpfofa] =~ sAr ls] type {| 


If the selector references a nonmemory segment with an invalid type (Type = 0, 8, 
OAH, ODH), ZF is reset and the dest register is not modified. 


8: Reference Section 


Flags 
OF DF IF TF SF ZF 


Faults 
PM RM V8086 

6 INT 6 #UDQ) 
12 #SS(0) 
13. #GP(O) INT 13 #GP(0) 
14 ¥#PF(ec) #PF(ec) 
17. #AC(O) #AC(0) 
Example 


; Verify that variable X contains the selector of a call gate 
; that can be legally invoked by the executing routine. 


LAR AX, X ; Load access rights 

JNZ no_access ; Branch if can’t access 

SHR AX, 8 ; Move access rights to low order 
AND AX, 1FH ; Save only S bit and TYPE 

CMP AX, OCH ; Test for 386 call gate 

JE is_gate ; Branch if accessible gate 
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LEA 8086/80186/80286/80386/80486 
Load Effective Address | (16p/32) 
Syntax 


LEA dest, src 


Operation 


dest < address(src) 


Legal Forms 


dest Src 
LEA reg, mem 
Description 


This instruction loads the address specified by the memory operand into the desti- 
nation register. No memory access cycle takes place. 


You can also use LEA to perform simple ce aa or addition as discussed in 
Chapter 4. 


Flags 
OF DF IF TF SF ZF 


Faults 
PM RM V8086 
6° #UDQO INT 6 #UD() 


*The undefined opcode fault occurs only when the src 
operand is encoded as a register. 


Examples 
LEA ESI, VECTORLEBX*4] ; Load address of array element 
LEA EDI, LEAXJLECX] ; Add contents of EAX and ECX, store in EDI 
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LEAVE 80186/80286/80386/80486 


Leave Current Stack Frame () 


Syntax 
LEAVE 


Operation 


MOV ESP, EBP 
POP EBP 


Legal Form 
LEAVE 


Description 


LEAVE is the counterpart of the ENTER instruction. ENTER is executed immedi- 
ately after a procedure call to set up a new stack frame. LEAVE is executed before a 
RET instruction to release the returning procedure’s stack frame. 


Flags 
OF DF IF TF SF ZF 


Faults 
PM RM V8086 
12 +#SS(O) 
13 13 #G0(0) 
Example 
ENTER: 4,0 ; First instruction of procedure 
; Procedure contents 
LEAVE ; Clean up stack frame 
RET ; And return to caller 
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LGDT 80286/80386/80486 
Load GDT Register () 
Syntax 
LGDT op 
Operation 


GDTR. limit <— [Lop] 
GDTR. base < [op + 2] 


Legal Form 


oe ge 
LGDT mem 


Description 


This instruction loads the GDTR register specifying the address and limit of the 
global descriptor table (GDT). The operand must point to a data structure in 
memory whose first 16 bits contain the limit of the global descriptor table and 
whose next 32 bits contain the linear base address of the GDT. 


Loading the GDTR does not invalidate the currently active descriptors; however, 
subsequent references to selectors load descriptors from the new GDT. 


A procedure must have a CPL of 0 to issue the LGDT instruction. 


Flags 
OF DF IF TF SF ZF 


Faults 
PM RM V8086 
6" #UD() INT 6 #UD() 
12 #SS(0) 
13. #GP(O) INT 13 #GP(0) 
14 #PF(ec) #PF(ec) 
17. #AC(0) | #AC(O) 


*The undefined opcode fault only occurs when the instruction 
is encoded with a register value for op. 


Example 
LGDT initial_table 
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LIDT 80286/80386/80486 
Load IDT Register CQ) 
Syntax 
LIDT op 
Operation 


IDTR.Jimit <— [op] 
IDTR.base «— [op + 2] 


Legal Form 


aN 1a ee ee ee ee ee 
LIDT mem 


Description 

This instruction loads the IDTR register and specifies the address and limit of the 
interrupt descriptor table (DT). The operand must point to a data structure in 
memory whose first 16 bits contain the limit of the interrupt descriptor table and 
whose next 32 bits contain the linear base address of the IDT. 


After loading the IDTR, any software or hardware interrupts, faults, or traps will 
cause an access to the new IDT. 


A procedure must have a CPL of 0 to issue the LIDT instruction. 


Flags 
OF DF IF TF SF ZF 


Faults 
PM RM V8086 
6" #UD() INT 6 #UD() 
12 #SS() 
13 +#GP(O) INT 13 #GP(O) 
14 #PF(ec) #PF(ec) 
17. #AC(O) #AC(O) 


*The undefined opcode fault only occurs when the op 
operand is encoded as a register. 


Example 
LIDT new_int_table ; Load IDT register 
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LLDT 80286/80386/80486 
Load LDT Register (16) 


Syntax 
LLDT op 


Operation 
LDTR < op 


Legal Forms 
ek ee 


LLDT reg 
LLDT mem 
Description 


This instruction loads a selector into the LDTR register and specifies a new local de- 
scriptor table (LDT). The operand to LLDT must contain a valid local descriptor table 
selector or the value 0. 


Active descriptors that refer to the previous LDT are not invalidated; however, subse- 
quent selector references load descriptors from the new LDT. 


If the LDTR is loaded with the value 0, all LDT selector references that cause a 
memory reference result in a general protection fault. 


The executing procedure must have a CPL of 0 to issue the LLDT instruction. 


Flags 
OF DF IF TF SF ZF AF PF CF 


Faults 
PM RM V8086 

6 INT 6 — #UDO 
11 #NPCel) 
12 #SS() 
13. #GP(O) 
13. #GP(sel) 
14 #PF(ec) #PF(ec) 
17. #AC(O) #AC(0O) 
Example 


LLDT task_B.ldtr : Get access to LDT for task B 
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LMSW 80286/80386/80486 
Load Machine Status Word (16) 
Syntax 
LMSW op 
Operation 


CRO <— (CRO & FFFFOOOOH) | op 


Legal Forms 
ee een) eRe Re ee te 


LMSW reg 
LMSW mem 
Description 


This instruction loads the low-order 16 bits of the CRO register. Use it only when 
running 80286 operating system code. On 32-bit systems, use the instruction MOV 
CRO, reg. Note that you can use LMSW to enter protected mode but not to leave it 
and that you can use MOV CRO, reg to both enter and leave protected mode. 


A procedure must be running in ring 0 to execute LMSW. 


Flags 
OF DF IF TF SEF ZF 


Faults 

PM RM V8086 
12 #SS() 
13 #GP(O) INT 13 #GP(0) 
14. #PF(ec) #PF(ec) 
17. #AC(O) #AC(0) 
Example 
LMSW init_state 
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LOCK 8086/80186/80286/80386/80486 
Assert Hardware LOCK\ Signal Prefix | O 
Syntax 
LOCK 


Legal Form 
LOCK 


Description 


The LOCK instruction prefix supports multiprocessor hardware configurations. 

You can use the hardware LOCK\ signal to ensure exclusive access to a particular 
memory byte, word, or dword. The LOCK instruction is valid only if it precedes an 
instruction in the list below. If you use it in combination with another instruction or 
in an unsupported form of one of the listed instructions, an undefined opcode fault 


occurs. 
Locked Form Locked Form 
Instruction of Instruction Instruction of Instruction 
BT mem, op OR mem, op 
BTS mem, op SBB mem, op 
BTR mem, op SUB mem, op 
BTC mem, op XOR mem, op 
XCHG mem, reg DEC mem 
XCHG reg, mem INC mem 
ADD mem, op NEG mem 
ADC mem, op NOT mem 
AND mem, op 


The LOCK\ signal is asserted for the duration of the instruction, including the time 
required for a read-modify-write cycle. The XCHG instruction does not require the 
LOCK prefix because the LOCK\ signal is always asserted during a memory XCHG. 


When writing software for multiprocessor systems, ensure that locked access for 
particular memory addresses always occurs to operands of the same size. In other 
words, if you use the dword at physical address 100, always get access to it as a 
dword and never as a byte or word. Locking is not guaranteed to operate correctly 
unless you observe this restriction. 


‘Flags 
OF DF IF TF SF ZF AF PF CF 
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Faults 

PM RM V8086 
6 #UDO INT 6 #UDO 
Example 
LOCK 
BTS semaphore, 3 
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LODS 8086/80186/80286/80386/80486 
Load String — (8/16p/32) 
Syntax 
LODS 
Operation 


when opcode is (LODSB, LODSW, LODSD) set opsize <« (1,2,4) 
acc « DS: [EST] 
if (DF = 0) then 
ESI ¢« ESI + opsize 
else 
ESI «— ESI - opsize 
endif 


Legal Forms 


LODSB ; Load string byte 

LODSW ; Load string word 

LODSD ; Load string doubleword 
Description 


This instruction loads the byte, word, or dword at DS:ESI into the accumulator. If the 
DF bit in the EFLAGS register is 0, ESI is incremented by the size of the operand 
(1, 2, or 4 bytes). If DF is 1, ESI is decremented. 


Because LODS is one of the 80386 string instructions, you can precede it with the 
REP prefix; however, the resulting instruction is useless, as it continuously over- 
writes the contents of the accumulator. 


You can precede the LODS instruction with a segment override prefix. In such a 


case, the operand is taken from the specified segment. 


Flags 
OF DF IF TF SF ZF 


Faults 

PM RM V8086 
12 #SS(0) 
13. ~#GP(0) INT 13 #GP(0) 
14 #PF(ec) #PF(ec) 


17 #AC() #AC(0) 
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Ll: 


DONE: 


LEA 
MOV 
LES 
LODSB 
OR 

JZ 
XLATB 
STOSB 
JMP 


EBX, A_to_E 
ESI, [EBP+12] 
EDI, [EBP+16] 


AL, AL 
DONE 


L1 
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; Address of translation table 
; Source address 

; Destination 

; Fetch byte from source 

; Test byte for zero 

; Branch if zero 

; Translate the byte 

; Save translated version 
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LOOPcc 8086/80186/80286/80386/80486 
Decrement ECX and Branch ; QO. 
Syntax 


LOOPcc offset 


Operation 

ECX + ECX - 1 

if (cc & (ECX != 0)) then 
EIP « EIP + offset 

endif 


Legal Forms 


LOOP offset 
LOOPZ offset 
LOOPNZ offset 
LOOPE offset 
LOOPNE offset 


Description 


These instructions support a decrement and branch operation. For all variants other 
than LOOP, the decrement and branch is combined with a test on the ZF bit. A loop 
counter is assumed in register ECX. The instruction decrements the register, and if 
the value of ECX is 0, no branch is taken. No flags are set as a result of the decre- 
ment operation. 


If the value of ECX is not 0, the branch is taken unless the condition in the LOOP cc 
forms is not true. 


Flags 
OF DF IF TF SF ZF 


Faults 
PM RM V8086 
13 #GPO) | INT 13 #GP(O) 
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; Initialize array of temp reals to 1.0 


FLDI1 
LEA 
MOV 
FLD 
FSTP 
LOOP 
FSTP 


ESI, array 


ECX, size 
ST(1), ST 


[EST] 
11 
ST(0), 


ST 


; Push 1.0 onto NDP stack 

; Starting address of array 

; Load loop counter 

; Duplicate 1.0 value on NDP stack 
; Store 1.0, pop NDP stack 

; Continue while ECX not 0 

; Done--pop last 1.0 constant off 
; NDP stack 
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Lseg 


Load Segment Register 


8086/80186/80286/80386/80486 
(16p/32) 


Syntax 


Lseg dest, src 


Operation 


dest — [src] 
seg <— [src + 4] 


Legal Forms 


dest Src 
LDS reg, mem 
LES reg, mem 
LFS reg, mem 
LGS reg, mem 
LSS reg, mem 
Description 


The src address specifies a 48-bit pointer (32-bit in real mode or V86 mode) con- 
sisting of a 32-bit offset followed by a 16-bit selector. The 32-bit offset is loaded into 
the dest register and the selector is loaded into the segment register specified by the 
instruction mnemonic. The 80386 protection mechanism validates the descriptor 


associated with the selector. 


Use only the ESP register with the Lseg instruction. 


Flags 
OF DF IF TF SF ZF 


Faults 

PM RM 
12 #SS(O) 
13. #GP() INT 13 
14 #PF(ec) 
17. ~#AC(O) 
Example 
LES ESI, BIGPTR 
LSS ESP, OLD_STACK 
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#GP(O) 
#PF(ec) 
#AC(0) 


; Load address of array element [EBX] 
; Load a new stack pointer 
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LSL 80286/80386/80486 
Load Segment Limit (16p/32) 
Syntax 


LSL dest, select 


Operation 


if (access_OK(se/ect)) then 
dest « descript(select).limit 
ZF <1 

else 
ZF <— 0 

endif 


Legal Forms 


dest select 
LSL reg, reg 
LSL reg, mem 
Description 


If the select operand is accessible to the executing program as a valid selector under 
the protection rules, this instruction loads the dest register with the segment limit 
from the descriptor indicated by select and sets ZF to 1. 


If the operand is not accessible or the descriptor associated with select does not con- 
tain a limit field, ZF is set to 0. 


The value stored in the dest register is always the offset of the last addressable byte 
in the segment (page granular limits are converted to byte granular limits). There- 
fore, do not use a 16-bit register as the dest operand because the resulting value 
might be too large. 


Flags 
OF DF IF TF SF ZF 


Faults 
PM RM V8086 
6 INT 6 #UDQO 
12 #SS(0) 
13. #GP(O) 
14. #PF(ec) #PF(ec) 


17. +#AC(O) #AC(O) 
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Example 
LSL EAX, [BP+12] ; Get limit of selector on stack 
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LTR 80286/80386/80486 
Load Task Register (16) 
Syntax 
“LTR select 
Operation 


TR & select 


Legal Forms 


select 
LTR reg 
LTR mem 
Description 


This instruction loads the task register with the selector specified by the operand. 
The TSS descriptor for the selector is marked “busy.” Loading the task register does 
not cause a task switch. 


If the procedure that executes the LTR instruction is not running with a CPL of 0, a 


general protection fault occurs. 


Flags 
OF DF IF TF SE ZF 


Faults 
PM RM V8086 
6 INT 6 #UDQO 
10 #NPCGel) 
12 #SS(O) 
13 #GP(O) 
13. #GP(sel) 
14 #PF(ec) 
17. #AC(O) #AC(0) 
Example 
LTR AX ; Load task register 
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MOV 8086/80186/80286/80386/80486 
Move Data | (8/16p/32) 
Syntax 


MOV dest, src 


Operation — 


dest «+ src 


Legal Forms 


dest Src 
MOV reg, idata 
MOV mem, jdata 
MOV reg, reg 
MOV reg, mem 
MOV mem, reg 
Description 


This instruction copies the contents of the src operand into dest. 


Flags 
OF DF IF TF SF ZF 


Faults 
PM RM V8086 
12 #SS() 
13 #GP(O) INT 13 #GP(0) 
14 #PF(ec) #PF(ec) 
17. #AC(O) #AC(O) 
Examples 
MOV AL, [LECX] ; Get byte from memory 
MOV ESI, 182H . ; Load ESI with data value 
MOV BX, DX ; 16-bit move 


MOV AH, 7FH : Load AH with 8-bit data 
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MOV 8086/80186/80286/80386/80486 
Move Selector (16) 
Syntax 


MOV dest, src 


Operation 


dest < src 


Legal Forms 


dest src 
MOV sreg, reg 
MOV sreg, mem 
MOV reg, sreg 
MOV mem, sreg 
Description 


This instruction copies the contents of the src operand into the dest operand. If the 
dest operand is a segment register, the instruction loads the descriptor associated 
with the selector into the 80386/80486 shadow registers. Privilege checks and tests 
for descriptor legality are made unless the selector value is 0. A protection fault oc- 
curs if 0 is loaded into the SS register. 


When the SS register is loaded, all hardware interrupts (including NMI) are masked 
until after the next instruction executes, to allow loading of the ESP register. 


Flags 
OF DF IF TF SF ZF AF PF CF 


Faults 
PM RM V8086 
10 #NP(sel) 
12 #SS(O) 
13. ~#GP(O) INT 13 #GP(0) 
14 #PF(ec) #PF(ec) 
17 #AC(O) #AC(O) 
Examples 
MOV - DS, AX ; Load new data segment. 
MOV ES, heap_seg ; Load ES register 
MOV save_ss, SS ; Store copy of SS register 
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MOV 80386/80486 
Move Special (32) 
Syntax 


MOV dest, src 


Operation 


dest + src 


Legal Forms 


dest Src 
MOV reg, reg 
Description 


This instruction copies or loads a special CPU register to or from an 80386/80486 
general register. The special registers are CRO, CR2, CR3, DRO, DR1, DR2, DR3, DR6, 
DR7, TR6, and TR7. 


A procedure must be running at a CPL of 0 to execute this instruction. 


Flags 
OF DF IF TF SF ZF AF PF CF 


Faults 
PM RM V8086 
13. +#GP(O) #GP(O) 
17. #AC(O) -  #AC(0) 
Examples 
MOV EAX, CRO ; Save CRO in EAX 
MOV TR7, ECX ; Load test register 7 
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MOVS 8086/80186/80286/80386/80486 
Move String (8/16p/32) 
Syntax 
MOVS 
Operation 


when opcode is (MOVSB, MOVSW, MOVSD) set opsize <« (1, 2, 4) 
ES:CEDI] <— DS:[{EST] 
if (DF = 0) then 
ESI « ESI + opsize 
EDI «— EDI + opsize 
else 
ESI « ESI - opsize 
ESI «+ ESI - opsize 
endif 


Legal Forms 


MOVSB ; Move string byte 

MOVSW ; Move string word 

MOVSD ; Move string doubleword 
Description 


This instruction copies the memory operand pointed to by DS:ESI to the destination 
address specified by ES:EDI. The operand is a byte, word, or doubleword, depend- 
ing on the opcode specified. The EDI and ESI registers are incremented by the size 
of the operand if the DF bit is 0 or decremented if the DF bit is 1. 


You can apply the REP prefix to the MOVS instruction to repeat the instruction. You 
must place the value specifying the repeat count in the ECX register. 


A segment override prefix may be applied to the MOVS instruction. It will override 
the DS segment of the DS:[ESI] operand. You cannot override the ES segment — 
» assumption for the EDI operand. 


For dword-aligned strings, a REP MOVSD transfers data quicker than does the 
equivalent REP MOVSB or REP MOVSW. However, if the source and destination 
strings overlap, only the REP MOVSB operation works correctly. 


Flags 
OF DF IF TF SF ZF 
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Faults 


12 
13 
14 
17 


PM RM 


#S$S(0) 

#GP(0) INT 13 
#PF(ec) 

#AC(0) 


Example 


LEA 
LES 
MOV 
CLD 


ESI, copyright_msg 
EDI, [LEBP+12] 
ECX, 31 


REP MOVSB 
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#GP(O) 
#PF(ec) 
#AC(0) 


; Get source string 
; ES:EDI loaded from stack frame 


Size of source string 
Ensure direction flag set correctly 
Copy byte string 


MOVSX 


Move with Sign Extension 


8: Reference Section 


80386/80486 
(8/16p/32) 


Syntax 
MOVSX dest, src 


Operation 


dest «— sign_extend(src) 


Legal Forms 


dest sre 
MOVSX reg, reg 
MOVSX reg, mem 
Description 


This instruction copies an 8-bit operand to a 16-bit or 32-bit destination or a 16-bit 

operand to a 32-bit destination and sign-extends the source operand to fit. Sign ex- 
tension is performed by duplicating the high-order bit of the src throughout the up- 
per bits of the dest operand. 


Flags 
OF DF IF TF SF ZF 


Faults 
PM RM 
12 #SS(0) 
13. =~#GP(O) INT 13 
14 #PF(ec) 
17. #AC(0O) 
Examples 


MOVSX EAX, AL 
MOVSX EDI, WORD PTR [EST] 
MOVSX CX, DL 


V8086 


#GP(0) 
#PF(ec) 
#AC(O) 


‘ Extend byte to dword 
; Extend word to dword 
; Extend byte to word 
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MOVZX 80386/80486 
Move with Zero Extension (8/16p/32) 
Syntax 


MOVZX dest, src 


' Operation 


dest ¢ src 


Legal Forms 


dest Src 
MOVZX reg, reg 
MOVZX reg, mem 
Description 


This instruction copies an 8-bit operand to a 16-bit or 32-bit destination or a 16-bit 
operand to a 32-bit destination and zero-extends the source operand to fit. Sign ex- 
tension is performed by filling the upper bits of the dest operand with 0. 


Flags 
OF DF IF TF SF ZF AF PF CF 


Faults 
PM RM V8086 
12 #SS(0) 
13. +#GP(O) INT 13 #GP(O) 
14. #PF(ec) #PF(ec) 
17 #AC() — #AC(O) 
Examples 
MOVZX EAX, AL ; Extend byte to dword 
MOVZX EDI, WORD PTR [ESI] ; Extend word to dword 
MOVZX CX, DL ; Extend byte to word 
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MUL 8086/80186/80286/80386/80486 
Unsigned Multiplication (8/16p/32) 
Syntax 
MUL src 
Operation 


acc € acc * src 


Legal Forms 


src 
MUL reg 
MUL | mem 
Description 


This instruction performs unsigned integer multiplication and requires only one 
operand, the multiplier. The multiplicand is the accumulator, and the product is also 
stored in the accumulator. The size of the src operand determines which registers 
will be used, as illustrated in the following table: 


Multiplier (src) Mutltiplicand Product 


byte AL AX 
word AX DX:AX 
dword EAX EDX:EAX 


The flags are left in an undetermined state except for OF and CF, which are cleared 
to 0 if the high-order byte, word, or dword of the product is 0. 


Flags . 
OF DF IF TF SE ZF AF PF CF 


Faults 

PM | RM . V8086 
12 #SS(O) 
13 #GP(O) INT 13 #GP(0) 
14 #PF(ec) #PF(ec) 
17. #AC(O) #AC(O) 
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DWORD PTR [EST] 


Example 

MOV EAX, 3 
MUL 

JC res_ 64 
MOV res_32, 


258 


EAX 


» 


; Multiplicand 

; Multiplier 

; Branch if result requires 64 bits 
; Else store product 
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NEG 8086/80186/80286/80386/80486 
Negate Integer (8/16p/32) 


Syntax 
NEG op 


Operation 
op <— -(op) 


Legal Forms 
ae a 


NEG reg 
NEG mem 
Description 


This instruction subtracts its operand from 0, which results in a two’s complement 
(integer) negation of the operand. 


Flags 
OF DF IF TF SF ZF AF PF CF 


Faults | 
PM RM V8086 
12 #SS() 
13. #GP(0) INT 13 #GP(0) 
14 #PF(ec) #PF(ec) 
17. +#AC(O) #AC(0) 
Example 
; Compute absolute value | 
OR EAX, EAX ; Test for +/- 
JNS ~~ SKIP ; Jump if not signed (positive) 
NEG EAX ; Negate negative number 
SKIP: 
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NOP 8086/80186/80286/80386/80486 
No Operation | O 
Syntax 
NOP 


Legal Form 
NOP 


Description 


This instruction performs no function other than taking up space in the code 
segment. 


Flags 
OF DF IF TF SF ZE 


Faults 


None. 


Example 
NOP ; Nothing occurs 
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NOT 8086/80186/80286/80386/80486 
Boolean Complement (8/16p/32) 


Syntax 
NOT op 


Operation 
op < ~op 


Legal Forms 
a er) Le ea ee ee 


NOT reg 
NOT mem 
Description 


This instruction inverts the state of each bit in the operand. 


Flags 
OF DF IF TF SF ZF 


Faults 

PM RM V8086 
12 #SS(0) 
13 #GP(O) INT 13 #GP(O) 
14 #PF(ec) #PF(ec) 
17. #AC(O) #AC(0) 
Example 
NOT ECX ; Insert ECX 
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OR 8086/80186/80286/80386/80486 
Boolean OR (8/16p/32) 
Syntax 
OR dest, src 
Operation 


dest «— dest | src 


Legal Forms 


dest 
OR reg, 
OR mem, 
OR reg, 
OR reg, 
OR mem, 
Description 


src 


idata 
idata 
reg 
mem 
reg 


This instruction performs a Boolean OR operation between each bit of the src 
operand and the dest operand. The result is stored in dest. The truth table defining 


the OR operation is as follows: 


01 0=0 
oO] 1= 
1] 0=1 
1] 1=1 


Flags 
OF DF IF TF SF ZF 


Faults 
PM 

12 

13. #GP() 

14 #PF(ec) 

17. #AC(O) 

Example 

OR 
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AL, 80H 


RM 


INT 13 


V8086 


#GP(O) 
#PF(ec) 
#AC(0) 


; Set high bit of AL 


8: Reference Section 


OUT 8086/80186/80286/80386/80486 
Output to Port (8/16p/32) 
Syntax 


OUT port, acc 


Operation 


port <— acc 


Legal Forms 


ort acc 
OUT data, acc 
OUT DX, acc 
Description 


This instruction outputs the value in the accumulator to the specified data port. 
Placing an immediate value in the port operand field lets you address ports 0-255. 
You can address port addresses 0—65,535 by storing the port number in the DX 
register. 


OUT is a privileged instruction. A procedure executing an output instruction must 
satisfy one of two conditions; otherwise, a general protection fault occurs. 


If the procedure that executes an OUT instruction has I/O privilege Cif its CPL is 
numerically less than or equal to the IOPL field in the EFLAGS register), the output 
instruction executes immediately. 


If the procedure does not have I/O privilege, the I/O permission bitmap for the cur- 
rent task is checked. If the bit(s) corresponding to the I/O port(s) is cleared to 0, the 
output instruction executes. If the bit(s) is set to 1, or the port(s) is outside the range 
of the bitmap, a general protection fault occurs. See Chapter 5 for more details on 
this feature. 


If the OUT instruction is encountered while in V86 mode, only the I/O permission 


bitmap is tested. The IOPL value is not a factor. 


Flags 
OF DF IF TF SF ZF 


Faults 
PM RM V8086 
13 #GP(O) #GP(0) 
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Example 
MOV DX, 378H ; Set port address 
OUT DX, AX 


; Write to ports 378 and 379 
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OUTS 80186/80286/80386/80486 
Output String (8/16p/32) 
Syntax 
OUTS 
Operation 


when opcode is (OUTSB, OUTSW, OUTSD) set opsize « (1,2,4) 
port (DX) < DS:[ESTI] 
if (DF = 0) then 
ESI « ESI + opsize 
else 
ESI « ESI - opsize 
endif 


Legal Forms 


OUTSB ; Out string byte 

OUTSW ; Out string word 

OUTSD ; Out string doubleword 
Description 


This instruction outputs the byte, word, or doubleword at offset ESI to the port 
specified in register DX. The ESI register is adjusted by the size of the memory 
operand— incremented if the DF bit is 0 or decremented if DF is 1. 


You can precede the OUTS instruction with the REP instruction; however, register 
ECX must contain a count of the number of times the OUTS instruction is tobe | 
executed. 


You can apply one of the segment override prefixes to the OUTS instruction, caus- 
ing the operand to be taken from the specified segment rather than the segment 
pointed to by DS. 


Output instructions are privileged instructions. The protection checks for the OUTS 
instructions are the same as those for the OUT instruction. 


Flags 
OF DF IF TF SE ZF AF PF CF 
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Faults 
PM RM 


12 #SS(O) 

13. #GP(O) INT 13 
14. #PF(ec) 

17. +#AC(O) 


Example 


LEA ESI, IO_CHNL_CMD 
MOV DX, CONTROLLER 
MOV ECX, 8 

REP OUTSD 
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V8086 


#SS(O) 
#GP(0) 
#PF(ec) 
#AC(0) 


Get pointer to string 
Get I/0 port number 


; Size of I/0 string 
; Output 8 doublewords 
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POP 8086/80186/80286/80386/80486 
Pop Segment Register (16) 
Syntax 
POP seg 
Operation 


seg <— SS:[ESP] 
ESP — ESP + 4 


Legal Form 
Seg 
POP sreg 


Description 


This instruction pops a 32-bit value off the stack and stores the low-order 16 bits in 
the specified segment register. Register CS is not a valid destination operand, but 
the other segment registers (DS, ES, SS, FS, and GS) are valid. 


The value stored in the segment register must be a valid selector or 0; otherwise, a 
protection fault occurs. (Register SS cannot be loaded with a 0.) Note also that a 
POP SS instruction has limited usefulness because SS and ESP are required to imple- 
ment a stack. However, if you execute a POP SS, the 80386 inhibits all hardware in- 
terrupts to enable the loading of ESP and the guarding against interrupts while the 
stack pointer is invalid. 


If the POP instruction is executed by a V86 mode task, only 16 bits are popped off 
the stack. 


Flags 
OF DF IF TF SF ZF AF PF CF 


Faults 

PM RM . V8086 
10 #NP(sel) 
12 #SS() #SS(0) 
13 #GP(O) INT 13 #GP(0) 
14 #PF(ec) #PF(ec) 
Examples 
POP | GS 
POP DS 
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POP 8086/80186/80286/80386/80486 
Pop Value off Stack (16p/32) 
Syntax 
POP dest 
Operation 


dest <— SS:[ESP] 

if (sizeof (dest) = 16) then 
ESP <— ESP + 2 

else 
ESP <— ESP + 4 

endif 


Legal Forms 


dest 
POP reg 
POP mem 
Description 


This instruction pops the current value at the top-of-stack, stores it in the dest 
operand, and adjusts the stack pointer. 


For optimum performance, keep the stack on a doubleword boundary. Pushing and 
popping 16-bit values might alter this alignment. For this reason, it is preferable to 
sign-extend or zero-extend a 16-bit operand to 32 bits before pushing or popping it. 


When you execute POP in V86 mode, the stack will generally be used only for 16- 
bit values. This does not degrade system performance. Pushing and popping 16-bit 
values leads to problems only when both 32-bit and 16-bit pushes and pops are 
mixed in the same code. 


Flags 
OF DF IF TF SF ZF 


Faults 

PM RM V8086 
12 +#SS(O) | 
13. #GP(O) INT 13 #GP(0) 
14 #PF(ec) #PF(ec) 
Example 
POP ECX 


268 


8: Reference Section 


POPA 80186/80286/80386/80486 
Pop All General Registers (16) 

Syntax 

POPA 

Operation 

POP DI 

POP SI 

POP BP 

ADD ESP, 2 

POP BX 

POP DX 

POP CX 

POP AX 

Legal Form 

POPA 

Description 


This instruction pops all 16-bit general registers except SP from the stack. Because 
the registers are stored as a 16-byte block of data, the POPA instruction does not 
affect doubleword alignment of the stack. 


Flags 
OF DF IF TF SF ZF 


Faults 

PM RM V8086 
12 #SS(O) 
13 INT 13 #GP(O) 
14 #PF(ec) #PF(ec) 
Example 
POPA 
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POPAD | 80386/80486 
Pop All General Registers (32) 

Syntax 

POPAD 

Operation 

POP EDI 

POP ESI 

POP EBP 

ADD ESP, 4 

POP EBX 

POP EDX 

POP ECX 

POP EAX 


Legal Form 
POPAD 


Description 
This instruction pops all 32-bit general registers except ESP from the stack. 


Flags 
OF DF IF TF SF ZF AF PF CF 


Faults 

PM RM V8086 
12 #SS(O) 
13 INT 13 #GP(O) 
14 #PF(ec) #PF(ec) 
Example 
POPAD 
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POPF 8086/80186/80286/80386/80486 
Pop Stack into FLAGS (16) 
Syntax 
POPF 
Operation 


FLAGS <— SS:[ESP] 
ESP < ESP + 2 


Legal Form 
POPF 


Description 


This instruction pops the low-order word of the EFLAGS register from the stack. 
POPF provides compatibility with previous Intel microprocessors. Use the POPFD 
instruction in native-mode programming. 


Flags 
OF DF IF TF SF ZF AF PF CF 


Faults 

PM RM V8086 
12 #SS(0) 
13 INT 13 #GP(O) 
14 ¥#PF(ec) #PF(ec) 
Example 
POPF 
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POPFD 80386/80486 
Pop Stack into EFLAGS | (32) 
Syntax 
POPFD 
Operation 


EFLAGS < SS:[ESP] 
ESP <+ ESP + 4 


Legal Form 
POPFD 


Description 


This instruction pops the top-of-stack into the EFLAGS register. The VM and RF bits 
initially present in EFLAGS are not modified. The interrupt flag is modified only if 
CPL < IOPL before the POPFD, that is, if the executing procedure has I/O privilege. 
The IOPL field is altered only if CPL = 0. 


Flags 
OF DF IF TF SF ZF AF PF CF 


Faults 

| PM RM V8086 
12. #SSC) 
13 INT 13 #GP(0) 
14 #PF(ec) #PF(ec) 
Example 
POPFD 
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PUSH 8086/80186/80286/80386/80486 
Push Value onto Stack (8/16p/32) 
Syntax 
PUSH op 
Operation 


if (sizeof(op) = 16) 
ESP < ESP - 2 
else 
ESP < ESP - 4 
endif 
SS: [ESP] <— op 


Legal Forms 
a ee 


PUSH idata 
PUSH reg 
PUSH sreg 
PUSH mem 
Description 


This instruction pushes the operand onto the stack. The stack pointer is decre- 
mented before the value is pushed. If the operand is the ESP register, the value 
stored on the stack is the value that ESP had before the instruction was executed. 
(This instruction is different from the 8086 instruction, which pushes the new 
value.) 


Note that pushing 16-bit registers and memory operands onto the stack changes the 
stack’s memory alignment. It is more efficient to sign-extend or zero-extend the 
operand to 32 bits and push the dword. The 80386 uses segment registers to push an 
instruction value onto the stack. 


When you execute the PUSH instruction in V86 mode, segment registers are pushed 
as 16-bit values. The stack will generally be used only for 16-bit values in V86 mode. 
This does not affect system performance because stack misalignment only occurs 
when both 16-bit and 32-bit values are pushed onto the stack. 


Flags 
OF DF IF TF SF ZF 
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Faults 


12 
13 
14 


PM RM 


#SS(0) 
#GP(O) 
#PF(ec) 


Examples 


PUSH 


7 


MOVSX EAX, AX 


PUSH 
PUSH 
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EAX 
array[LESI*4] 


V8086 


#PF(ec) 


; Push data value 

; Sign extend AX 

; Then push 

; Push memory value 
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PUSHA 80186/80286/80386/80486 
Push 16-Bit General Registers (16) 

Syntax 

PUSHA 

Operation 

temp < SP 

PUSH AX 

PUSH CX 

PUSH DX 

PUSH BX 

PUSH temp 

PUSH BP 

PUSH SI 

PUSH DI 


Legal Form 
PUSHA 


Description 


This instruction stores a copy of all eight 16-bit registers on the stack. This instruc- 
tion provides compatibility with 80186 and 80286 software. Use the PUSHAD in- 
struction in native-mode environments. | 


Flags | 
OF DF IF TF SF ZF AF PF CF 


Faults 

PM RM V8086 
12 #SS(0) 
13 INT 13 #GP(0) 
14 #PF(ec) #PF(ec) 
Example 
PUSHA 
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PUSHAD 80386/80486 
Push 32-Bit General Registers (32) 

Syntax 

PUSHAD 

Operation 

temp < ESP 

PUSH EAX 

PUSH ECX 

PUSH EDX 

PUSH EBX 

PUSH temp 

PUSH EBP 

PUSH ESI 

PUSH EDI 

Legal Form 

PUSHAD 

Description 


This instruction stores a copy of all eight general registers on the stack. The value 
of ESP that is saved to the stack is the ESP value before execution of the PUSHAD 
instruction. 


Flags 
OF DF IF TF SF ZF 


Faults 

PM RM V8086 
12 +#SS(0) #GP(0) 
13 INT 13 
14. #PF(ec) #PF(ec) 
Example 
PUSHAD 
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PUSHF 


Push 16-Bit EFLAGS Register 


8: Reference Section 


8086/80186/80286/80386/80486 
(16) 


Syntax 
PUSHF 


Operation 
ESP = ESP - 2 
SS: [ESP] < FLAGS 


Legal Form 
PUSHF 


Description 


This instruction pushes the low-order 16 bits of the EFLAGS register onto the stack. 
PUSHF provides compatibility with 16-bit processors and causes misalignment of 
the stack if used in native mode. Only 32-bit programs should use PUSHFD. 


PUSHF causes a general protection fault in V86 mode if the executing procedure’s 


IOPL is numerically less than 3. 


Flags 
OF DF IF TF SF ZF 


Faults 
PM RM 


12 ¥#SS() 
13 

14. #PF(ec) 
Example 
PUSHF 
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PUSHFD 80386/80486 
Push EFLAGS Register (32) 
Syntax 
PUSHFD 
Operation 


ESP = ESP - 4 
SS:[ESP] <— EFLAGS 


Legal Form 
PUSHFD 


Description 


This instruction pushes the contents of the EFLAGS register onto the stack. PUSHF 
will cause a general protection fault in V86 mode if IOPL is less than 3. 


Flags 
OF DF IF TF SF ZF 


Faults 

PM RM V8086 
12 #SS(O) 
13 #GP(O) 
14 #PF(ec) | #PF(ec) 
Example 
PUSHFD 
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RCL 8086/80186/80286/80386/80486 
Rotate Through Carry Left (8/16p/32) 
Syntax 


RCL dest, count 


Operation 


temp <— max (count, 31) 
if (temp = 1) then 
OF ¢ (highbit(dest) != CF) 
else 
OF « ? 
endif 
value < concatenate (CF, dest) 
while (temp != 0) 
x «— highbit (value) 
value <— (value << 1) + x 
temp <« temp - 1 
endwhile 
CF <— highbit (value) 
dest <— value 


Legal Forms 


dest count 
RCL reg, idata 
RCL mem, idata 
RCL reg, CL 
RCL mem, cL 
Description 


This instruction concatenates the carry flag (CF) with the dest operand and rotates 
the value the specified number of times. A rotation is implemented by shifting the 
value once and transferring the bit shifted off the high end to the low-order position 
of the value. 


The OF bit is defined only if the rotate count is 1. The 80386 and 80486 never rotate 
a pattern more than 31 times. Counts greater than 31 are masked by the bit pattern 
OOOO0001FH. 


Flags 
OF DF IF TF SF ZF 


279 


MICROSOFT’S 80386/80486 PROGRAMMING GUIDE 


Faults 
PM RM V8086 
12 #SS(O) 
13 #GP(O) INT 13 #GP(0) 
14 #PF(ec) #PF(ec) 
17. +#AC(0) #AC(0) 
Example 
RCL EAX, 3 ; Rotate EAX 3 bits left 
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RCR 


Rotate Through Carry Right 
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8086/80186/80286/80386/80486 
(8/16p/32) 


Syntax 


RCR dest, 


Operation 


temp <— max (count, 31) 


count 


if (temp = 1) then 
OF < (highbit(dest) != highbit(dest << 1)) 


else 


OF « ? 


endif 


value < concatenate (dest, CF) 


while (temp != 0) 
xX & value & 1 


value <— (value >> 1) 
highbit (value) < x 


temp <« temp - 1 
endwhile 
CF < highbit (value) 


dest < value 


Legal Forms 


dest 
RCR reg, 
RCR mem, 
RCR reg, 
RCR mem, 
Description 


count 


idata 
idata 
CL 
CL 


This instruction concatenates the carry flag (CF) with the dest operand and rotates 
the value the specified number of times. A rotation is implemented by shifting the 
value once and transferring the bit shifted off the low end to the high-order position 
of the value. 


The OF bit is defined only if the rotate count is 1. The 80386 and 80486 never rotate 
a pattern more than 31 times. Counts greater than 31 are masked by the bit pattern 
0000001FH. 


Flags 


OF DF IF TF SF ZF 
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Faults 
PM RM | V8086 
12 ¥#SS() 
13 #GP(QO) INT 13 #GP(O) 
14 #PF(ec) #PF(ec) 
17. #AC(O) #AC(O) 
Example 
RCR EAX, 3 ; Rotate EAX 3 bits right 
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REP 8086/80186/80286/80386/80486 
Repeat String Prefix O 
Syntax 
REP 


Legal Forms 


REP 
REPE 
REPZ 
REPNE 
REPNZ 


Description 


The repeat prefix may be applied to any string instruction (CMPS, INS, LODS, 
MOVS, OUTS, SCAS, STOS). When the prefix is present, the string instruction exe- 
cutes repeatedly based on the count value in the ECX register. The ZF flag is also 


tested when executing CMPS or SCAS. 


If ECX is O when a repeated string instruction is encountered, the string instruction 


will not be executed. 


Refer to the individual string instructions in this chapter for additional information. 


Flags 
OF DF IF TF SF ZF 


Faults 

PM RM 
6 #UDO INT 6 
Example 
MOV EAX, 0 
MOV ECX, 1024/4 
REP STOSD 


; initialize 1 KB of memory to 0 
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RET 8086/80186/80286/80386/80486 


Near Return from Subroutine Q) 


Syntax 
RET count 


Operation 
EIP < pop (); 
ESP — ESP + count 


Legal Forms 


count 
RET 
RET idata 
Description 


This instruction restores the instruction pointer to the value it held before the 
previous CALL instruction. The value of EIP that had been saved on the stack is 
popped. If the count operand is present, the count value is added to ESP, removing 
any operands that were pushed onto the stack for the subroutine call. 


Flags 
OF DF IF TF SF ZF AF PF CF 


Faults 
PM RM V8086 
12 #SS() 
13 #GP(O) INT 13 #GP(0) 
14 #PF(ec) #PF(ec) 
Example 
RET 4 ; Return and pop one dword 
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RETF 8086/80186/80286/80386/80486 
Far Return from Subroutine ‘@) 


Syntax 
RETF count 


Operation 


EIP — pop() 
CS < pop() 
ESP — ESP + count 


Legal Forms 


count 


RETF 
RETF idata 


Description 


This variation of the RET instruction pops both a new CS and EIP from the stack. 
The instruction assumes that the CS value is stored as the low-order 16 bits of a 
— dword on the stack. 


If this instruction causes a privilege-level transition, the protection checks de- 
scribed in Chapter 5 take place. 


Flags 
OF DF IF TF SF ZF 


Faults 

PM RM V8086 
10 ¥#NP(sel) 
12 #SS(0) 
13 #GP(O) INT 13 #GP(0) 
14 #PF(ec) #PF(ec) 
Example 
RETF > Far return 
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ROL | 8086/80186/80286/80386/80486 
Rotate Left (8/16p/32) 
Syntax 


ROL dest, count 


Operation 


temp <— max (count, 31) 

if (temp = 1) then 
OF — (highbit(dest) != CF) 

else 
OF « ? 

endif 

while (temp != 0) 
x — highbit (dest) 
dest « (dest << 1) + x 
temp < temp - 1 
endwhile 

CF <— highbit (dest) 


Legal Forms 


dest count 
ROL reg, idata 
ROL mem, idata 
ROL reg, CL 
ROL mem, CL 
Description 


This instruction rotates the dest operand the specified number of times. A rotation 
is implemented by shifting the value once and transferring the bit shifted off the 
high end to the low-order position of the value. 


The OF bit is defined only if the rotate count is 1. The 80386 and 80486 never rotate 
a pattern more than 31 times. Counts greater than 31 are masked by the bit pattern 
OOOO001FH. 


Flags 
OF DF IF TF SF ZF 
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Faults 
PM RM V8086 
12 ¥#SS(0O) 
13 #GP(0) INT 13 #GP(0) 
14 ¥#PF(ec) #PF(ec) 
17. #AC(0) #AC(0) 
Example 
ROL EAX, 3 > Rotate EAX 3 bits left 
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ROR | 8086/80186/80286/80386/80486 
Rotate Right (8/16p/32) 
Syntax 


ROR dest, count 


Operation 


temp <— max (count, 31) 
if (temp = 1) then 
OF + (highbit(dest) != highbit(dest << 1)) 
else 
OF « 2? 
endif 
while (temp != 0) 
x © value & l 
value <— (value >> 1) 
highbit(value) « x 
temp < temp - 1 
endwhile 
CF <— highbit (value) 
dest — value 


Legal Forms 


dest count 
ROR reg, idata 
ROR mem, idata 
ROR reg, CL 
ROR mem, CL 
Description 


This instruction rotates the dest operand the specified number of times. A rotation 
is implemented by shifting the value once and transferring the bit shifted off the low 
end to the high-order position of the value. 


The OF bit is defined only if the rotate count is 1. The 80386 never rotates a pattern 
more than 31 times. Counts greater than 31 are masked by the bit pattern 
OOO0001FH. 


Flags 
OF DF IF TF SF ZF 
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Faults 
PM RM V8086 
12 +#SS(O) 
13 #GP(0) INT 13 #GP(0) 
14 ¥#PF(ec) #PF(ec) 
17. #ACO) #AC(O) 
Example 
ROR EAX, 3 ; Rotate EAX 3 bits right 
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SAHF 8086/80186/80286/80386/80486 
Store AH in EFLAGS (8) 
Syntax 
SAHF 
Operation 


EFLAGS < EFLAGS | (AH & OD5H) 


Legal Form 
SAHF 


Description 


This instruction loads the contents of the AH register into bits 7, 6, 4, 2, and 0 of the 
EFLAGS register. 


Flags 
OF DF IF TF SF ZF AF PF CF 


Faults 


None. 


Example 
SAHF 
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SAL 8086/80186/80286/80386/80486 
Shift Left Arithmetic (8/16p/32) 
Syntax 


SAL dest, count 


Operation 


temp <— count & O0O1FH 
while (temp != 0) 
CF <— highorder (dest) 
dest « dest << l 
temp <— temp - 1 
end 
if count = 1 then 
OF — highorder (dest) != CF 
else 
OF « ? 


Legal Forms 


dest count 
SAL reg, idata 
SAL mem, idata 
SAL reg, CL 
SAL mem, CL 
Description 


This instruction shifts the dest operand count bits to the left. The arithmetic shift 
left (SAL) and logical shift left (SHL) are equivalent instructions. 


The count operand must either be an immediate data value or be stored in register 
CL. The 80386 and 80486 mask the count operand with 1FH so that the count value 
is never greater than 31. 


If the count operand is 1, the overflow flag is reset to 0 when the high-order bit and 
the carry flag have the same value after the shift. If the high-order bit and CF have 
different values, OF is set to 1. If count is greater than 1, OF is undefined. 


A left shift is equivalent to multiplying the dest operand by 2°0¥™. 


Flags 
OF DF IF TF SF ZF AF PF CF 
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Faults 
PM RM 


12 #SS() 

13. #GP() INT 13 
14. #PEF(ed) 

17. #AC(O) 


Examples 


SAL ECX, 7 
SAL WORD PTR [EBP+8], CL 


V8086 


#GP(O) 
#PF(ec) 
#AC(0) 


8: Reference Section 


SAR 8086/80186/80286/80386/80486 
Shift Right Arithmetic (8/16p/32) 
Syntax 


SAR dest, count 


Operation 


temp <— count & OO1FH 

while (temp != 0) 
save <— highorder (dest) 
CF = dest & l 
dest <— dest >> 1 
highorder (dest) = save 
temp < temp - 1 
end 

if count = 1 then 
OF <« 0 

else 
OF «+ ? 


Legal Forms 


dest count 
SAR reg, idata 
SAR mem, idata 
SAR reg, CL 
SAR mem, CL 
Description 


This instruction shifts the dest operand count bits to the right. The shift is called 
arithmetic because it preserves the sign bit of the dest operand. 


The count operand must be an immediate data value or it must be stored in register 
CL. The 80386 and 80486 mask the count operand with 1FH so that the count value 
is never greater than 31. 


If count is 1, the overflow is reset to 0. If count is greater than 1, OF is undefined. 


The arithmetic right shift is similar to dividing dest by 2-0¥”* except that negative 
values are rounded toward negative infinity, rather than toward 0 (that is, —3 shifted 
left 1 rounds to —2, whereas —3 divided by 2! rounds to —1). 


Flags 
OF DF IF TF SF ZF AF PF CF 
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Faults 
PM RM 


12 #SS(O) 

13. #GP() INT 13 
14 ¥#PF(ec) 

17. #AC(O) 


Examples 


SAR ECX, 7 
SAR WORD PTR [EBP+8], CL 
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#GP(O) 
#PF(ec) 
#AC(0) 
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SBB 8086/80186/80286/80386/80486 
Subtraction with Borrow (8/16p/32) 
Syntax 


SBB dest, src 


Operation 


dest <« dest - src - CF 


Legal Forms 


dest src 
SBB reg, idata 
SBB mem, idata 
SBB reg, reg 
SBB reg, mem 
SBB mem, reg 
Description 


This instruction subtracts the src operand from the dest operand and decrements 
the dest operand by 1 if the CF flag is set. The result is stored in dest. 


Flags 
OF DF IF TF SF ZF AF PF CF 


Faults 
PM RM V8086 
12 #SS() 
13. #GP(O) INT 13 #GP(0) 
14 #PF(ec) #PF(ec) 
17. #AC(O) #AC(0) 
Example 
; 64-bit subtraction operation EDX:EAX - EBX:ECX 
SUB EAX, ECX ; Low-order bits 
SBB EDX, EBX ; High-order bits 
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SCAS 8086/80186/80286/80386/80486 
Scan String (8/16p/32) 
Syntax 
SCAS 
Operation 


when opcode is (SCASB, SCASW, SCASD) set opsize « (1, 2, 4) 
NULL — acc - ES:[EDI] 
if (DF = 0) then 
EDI « EDI + opsize 
else 
EDI « EDI - opsize 
endif 


Legal Forms 


SCASB ; Scan string byte 

SCASW ; Scan string word 

SCASD ; Scan string doubleword 
Description 


This instruction compares the value in the accumulator (AL, AX, or EAX) with the 
operand at ES:[EDI]. The flags are set according to the compare operation, and the 
EDI register is adjusted by the size of the operand. If the direction flag (DF) is 0, 
EDI is incremented; otherwise, it is decremented. 


You can apply the REPE or REPNE prefix to the SCAS instruction. The ECX register 
contains a repeat count, indicating the maximum number of times the instruction 
should be repeated. The instruction will repeat only while the repeat condition is 
true, that is, when ZF = 1 for REPE (REPZ) or ZF = 0 for REPNE (REPNZ). 


You cannot use a segment override prefix with SCAS. The ES register is always the 
destination of the string to be scanned. 


Flags 
OF DF IF TF SF ZF AF PF CF 
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Faults 
PM RM V8086 
12 #SS(O) 
13. #GP(Q) INT 13 #GP(0O) 
14 #PF(ec) #PF(ec) 
17. #AC(O) #AC(O) 
Example 
; Search for an asterisk in a string 
LES EDI, [EBP+12] ; String pointer on stack 
MOV ECX, [LEBP+20] ; String size on stack 
CLD 
MOV AL, ‘*' ; Character to search for 
REPNE SCASB ; Scan 
JE MATCH ; Branch if found 


8: Reference Section 
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seg 8086/80186/80286/80386/80486 
Segment Override Prefix | O 


Legal Forms 


CS: 
DS: 
SS: 
ES: 
Eo: 
GS: 
Description 


The instruction that follows these prefixes takes its memory operand from the 
specified segment rather than from the default segment. 


You cannot override the following string instructions: 
INS 

SCAS 

STOS 


Flags 
OF DF IF TF SF ZF AF PF CF 


Faults 

None. 

Example 

MOV EAX, FS:LESI] ; Read from FS rather than DS 
ADD DS:[LEBP], 7 ; Write to DS rather than SS 
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SETcc 80386/80486 
Set Byte on Condition (8) 


Syntax 
SETcc dest 


Operation 


if (cc) then 
dest ¢ 1 
else 
dest « 0 
endif 


Legal Forms 


SETA dest ; Set if above (unsigned x > y) / CF = 0 & ZF = 0 
SETAE dest .; Set if above or equal / CF = 0 

SETB dest ; Set if below (unsigned x < y) / CF =1 
SETBE dest ; Set if below or equal / CF = 1 | ZF=1 
SETC dest ; Set if carry / CF =1 

SETE dest ; Set if equal / ZF=1 

SETG dest ; Set if greater (signed x > y) / SF = OF & ZF = 0 
SETGE dest ; Set if greater or equal / SF = OF 

SETL dest ; Set if less (signed x < y) / SF != OF 

SETLE dest ; Set if less or equal / SF != OF & ZF=1 
SETNA dest ; Set if not above (SETBE) 

SETNAE dest ; Set if not above or equal (SETB) 

SETNB dest ; Set if not below (SETAE) 

SETNBE dest ; Set if not below or equal (SETA) 

SETNC dest ; Set if no carry / CF = 0 

SETNE dest ; Set if not equal / ZF = 0 

SETNG dest ; Set if not greater (SETLE) 

SETNGE dest ; Set if not greater or equal (SETL) 

SETNL dest ; Set if not less (SETGE) 

SETNLE dest ; Set if not less or equal / SF = OF & ZF = 0 
SETNO dest ; Set if no overflow / OF = 0 

SETNP dest ; Set if no parity / PF = 0 

SETNS dest ; Set if no sign / SF = 0 

SETNZ dest ; Set if not 0 / ZF = 0 

SETO dest ; Set if overflow / OF = 1 

SETP dest ; Set if parity / PF =1 

SETPE dest ; Set if parity even / PF = 1 

SETPO dest ; Set if parity odd / PF = 0 

SETS dest ; Set if sign / SF=1 

SETZ dest ; Set if 0 / ZF=1 


MICROSOFT’S 80386/80486 PROGRAMMING GUIDE 


Description 


This instruction sets the dest byte to 1 if the condition described by the opcode is 
met; otherwise, the instruction clears the byte to 0. 


Flags 
OF DF IF TF SF ZF 


Faults 

PM RM V8086 
12 #SS(O) #SS(O) 
13 +#GP(O) INT 13 #GP(O) 
14 #PF(ec) #PF(ec) 
Example 
SETNZ AL 


MOVZX EAX, AL 
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SGDT 80286/80386/80486 
Store GDT Register O 
Syntax 
SGDT dest 
Operation 


dest — GDTR.LIMIT 
dest + 2 <— GDTR.BASE 


Legal Form 
dest 
SGDT mem 


Description 


This instruction writes the limit portion of the GDTR to the dest memory address 
and writes the linear base address of the GDT to the dword at dest + 2. 


Flags 
OF DF IF TF SF ZF 


Faults 
PM RM V8086 
6" #UDQO INT 6 #UD(Q) 
12 ¥#SS(O) 
13 #GP(O) INT 13 #GP(O) 
14 #PF(ec) #PF(ec) 
17. #AC(0) | #AC(0) 


* The undefined opcode fault occurs only when the dest 
operand is encoded as a register. 


Example 
SGDT [300H] 3; Save GDTR 
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SHL 8086/80186/80286/80386/80486 
Shift Left Logical (8/16p/32) 
Syntax 


SHL dest, count 


Operation 


temp <— count & OOI1FH 
while (temp != 0) 
CF <— highorder (dest) 
dest « dest << l 
temp < temp - 1 
end 
if count = 1 then 
OF « highorder (dest) != CF 
else 
OF ¢« ? 


Legal Forms 


dest count 
SHL reg, idata 
SHL mem, idata 
SHL reg, CL 
SHL mem, CL 
Description 


This instruction shifts the dest operand count bits to the left. The arithmetic left 
shift (SAL) and logical left shift (SHL) are equivalent instructions. 


The count operand must either be an immediate data value or be stored in register 
CL. The 80386 and 80486 mask the count operand with 1FH so that the count value 
is never greater than 31. 


If the count operand is 1, the overflow flag is reset to 0 when the high-order bit and 
the carry flag have the same value after the shift. If the high-order bit and CF have 
different values, OF is set to 1. If count is greater than 1, OF is undefined. 


A left shift is equivalent to multiplying the dest operand by 2°°4%”*, 


Flags 
OF DF IF TF SF ZF AF PF CF 
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Faults 

PM RM V8086 
12 #SS(O) 
13. #GP(O) INT 13 #GP(0) 
14 #PF(ec) #PF(ec) 
17. #AC(0) #AC(0) 
Examples 
SHL ECX, 7 
SHL WORD PTR [LEBP+8], CL 
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SHLD 80386/80486 
Shift Left Double (16p/32) 
Syntax 
SHLD dest, src, count 


Operation 


temp <— max (count, 31) 


value < concatenate (dest, src) 


value < value << temp 
dest <— value 


Legal Forms 


dest 
SHLD reg, 
SHLD mem, 
SHLD reg, 
SHLD mem, 
Description 


src 


reg, 
reg, 
reg, 
reg, 


count 


idata 
idata 
CL 
CL 


This instruction concatenates the src operand to the dest operand and shifts the 


resulting double-size value left. The low-order bits are stored in dest. 


The count operand is masked with 1FH so that no shift counts greater than 31 are 
used. 


Flags 
OF DF IF TF SF ZF 


Faults 


12 
13 
14 
17 


PM 


#SS(0) 
#GP(0) 
#PF(ec) 
#AC(O) 


MOV 
SHLD 
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EAX, [LEST] 


INT 13 


EAX, [LESI+4], 7 


; Get low-order dword 
; 64-bit shift 
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SHR 8086/80186/80286/80386/80486 
Shift Right Logical (8/16p/32) 
Syntax 


SHR dest, count 


Operation 


temp « count & OO1FH 
while (temp != 9) 

CF = dest & l 

dest « dest >> 1 

temp <— temp - 1 

end 
if count = 1 then 

OF — highorder (dest) 
else 

OF <« ? 


Legal Forms 


dest count 
SHR reg, idata 
SHR mem, jdata 
SHR reg, CL 
SHR mem, CL 
Description 


This instruction shifts the dest operand count bits to the right. The high-order bits 
are cleared to 0 as the low-order bits are shifted. 


The count operand must either be an immediate data value or be stored in register 
CL. The 80386 and 80486 mask the count operand with 1FH so that the count value 
is never greater than 31. 


If the count operand is 1, the overflow flag is set to the high-order bit of the dest 
operand. If count is greater than 1, OF is undefined. 


Flags 
OF DF IF TF SF ZF AF 
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Faults 


12 
13 
14 
17 


PM RM 


#SS(0) 

#GP(O) INT 13 
#PF(ec) 

#AC(0) 


Examples 


SHR 
SHR. 
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ECX, 7 


WORD.PTR [EBP+8], 


CL 


V8086 


#GP(0) 
#PF(ec) 
#AC(O) 
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SHRD | 80386/80486 
Shift Right Double (16p/32) 
Syntax 


SHRD dest, src, count 


Operation 


temp < max (count, 31) 
value <— cat (src, dest) 
value <— value >> temp 
dest — value 


Legal Forms 


dest Src count 
SHRD reg, reg, idata 
SHRD mem, reg, idata 
SHRD reg, reg, CL 
SHRD mem, reg, CL 
Description 


This instruction concatenates the src operand to the dest operand and shifts the 
resulting double-size value right. The low-order bits are stored in dest. 


The count operand is masked with 1FH so that no shift counts greater than 31 are 
used. 


Flags 
OF DF IF TF SF ZF AF PF CF 


Faults 
PM RM V8086 
12 #SS(0) 
13. #GP(O) INT 13 #GP(0) 
14 ¥#PF(ec) #PF(ec) 
17. #AC(O) #AC(0) 
Example 
MOV EAX, [LOO2AH]_ : Get low-order dword 
SHRD EAX, [OO2EH] ; 64-bit shift 
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SIDT 80286/80386/80486 
Store IDT Register O 
Syntax 
SIDT dest 
Operation 


dest <— IDTR.LIMIT 
dest + 2 <— IDTR.BASE 


Legal Form 


dest 
SIDT mem 
Description 


This instruction writes the limit portion of the IDTR to the dest memory address 
and the linear base address of the IDT to the dword at dest + 2. 


Flags 
OF DF IF TF SF ZF 


Faults 
PM RM V8086 
6° #UDQ INT 6 #UD(Q) 
12 #SS(O) 
13 #GP(O) INT 13 #GP(0) 
14 #PF(ec) #PF(ec) 
17. +#AC(O) #AC(0) 


* The undefined opcode fault occurs only when the dest 
operand is encoded as a register. 


Example 
SIDT int_tab ; Get address and limit of IDT 
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SLDT | 80286/80386/80486 
Store LDT Register (16) 


Syntax 
SLDT dest 


Operation 
dest <— LDTR 


Legal Forms 


dest 
SLDT reg 
SLDT mem 
Description 


This instruction stores the selector in the LDTR in the destination location. 


Flags 
OF DF IF TF SF ZF 


Faults 
PM RM V8086 

6 INT 6 #UDO 
12 #SS(O) 
13 #GP(O) 
14 #PF(ec) 
17. #AC(0) #AC(0) 
Example 
SLDT DX ; Put LDT selector into DX | 
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SMSW 80286/80386/80486 
Store Machine Status Word (16) 


Syntax 
SMSW dest 


Operation 
dest « MSW 


Legal Forms 


dest 
SMSW reg 
SMSW mem 
Description 


This instruction stores the low-order 16 bits of register CRO (the 80286 machine 
status word) in the dest operand. 


This instruction is provided for compatibility only. Use the MOV CRO instruction in 


native mode programming. » 


Flags 
OF DF IF TF SF ZF 


Faults 

PM RM V8086 
6 
12 #SS(0) 
13 #GP() INT 13 #GP(0) 
14 = #PF(ec) #PF(ec) 
17. #AC(0) #AC(0) 
Example 
SMSW [DI] 
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STC 8086/80186/80286/80386/80486 
Set Carry Flag O 


Syntax 
STC 


Operation 
CF «1 


Legal Form 
STC 


Description 
This instruction sets the carry flag (CF) in the EFLAGS register to 1. 


Flags 
OF DF IF TF SF ZF AF PE CF 


Faults 


None. 


Example 
STC ; Carry flag set to l 
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STD 8086/80186/80286/80386/80486 
Set Direction Flag O 


Syntax 
STD 


Operation 
DF <« 1 


Legal Form 
STD 


Description 


This instruction sets the direction flag (DF) in the EFLAGS register to 1. This in- 
struction indicates reverse direction in the string instructions to decrement the in- 
dex registers when DF = 1. 


Flags 
OF DF IF TF SF ZF 


None. 


STD ; Prepare for reverse string operation 
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STi 8086/80186/80286/80386/80486 
Set Interrupt Flag QO 


Syntax 
STI 


Operation 
IF — 1 


Legal Form 
STI 


Description 
This instruction sets the interrupt flag (IF) in the EFLAGS register to 1, enabling 
hardware interrupts. 


The executing program must have a high enough privilege (CPL < IOPL) to issue the 
STI command to avoid a general protection fault. 


Flags 
OF DF IF TF SF ZF 


Fault 
PM RM V8086 

13 #GP(O) 

Example 
Cre: | ; Disable interrupts 
MOV AL, semaphore ; Get memory value 
DEC AL ; Decrement counter 
J2 DONE ; Skip if value was 0 
MOV semaphore, AL ; Update 

DONE: 
STI ; Reenable interrupts 
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STOS 


Store String 


8086/80186/80286/80386/80486 
(8/16p/32) 


Syntax 
STOS 


Operation 


when opcode is (STOSB, STOSW, STOSD), 
ES:CLEDI] <— accum 
if (DF = 0) then 
EDI « EDI + opsize 
else 
FDI « EDI - opsize 
endif 


set opsize « (1, 2, 4) 


Legal Forms 


STOSB ; Store string byte 
STOSW ; Store string word 
STOSD ; Store string doubleword 
Description 


This instruction writes the current contents of the accumulator (AL, AX, or EAX, 
depending on the opcode used) to the memory location pointed to by ES:EDI. It 
then increments or decrements EDI by the size of the operand, according to the DF 
bit in the EFLAGS register. 


If you precede the STOS instruction with the REP prefix, register ECX must contain 
a count of the number of times STOS is to be executed. This fills memory with the 
value in the accumulator. 


You cannot use a segment override prefix with the STOS instruction. The destina- 


tion segment will always be selected by ES. 


Flags 
OF DF IF TF SF ZF 


Faults 

PM RM V8086 
12 #SS(0O) 
13 #GP(0) INT 13 #GP(0) 
14. #PF(ec) #PF(ec) 
17. #AC(O) #AC(0) 
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Example 

; Clear 100 bytes of memory beginning at location 0 
MOV EDI, 0 ; Base address 

MOV ECX, 100 / 4 ; Count (in dwords) 

XOR EAX, EAX ; Clear accumulator to 0 
CLD 

REP STOSD ; Zero memory 
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STR 80286/80386/80486 
Store Task Register (16) 


Syntax 
STR dest 


Operation 
dest — TR 


Legal Forms 


dest 
STR reg 
STR mem 
Description 


This instruction stores the task register selector in dest. 


Flags 
OF DF IF TF SF ZF 


Faults 
PM RM V8086 
6 INT 6 #UDQO 
12 #SS() | 
13. #GP(O) 
14 #PF(ec) 
17. #AC(O) #AC(O) 
Example 
STR CX ; Store current task’s selector 
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SUB 8086/80186/80286/80386/80486 
Subtraction (8/16p/32) 
Syntax 


SUB dest, src 


Operation 


dest + dest - src 


Legal Forms 


dest sre 
SUB reg, idata 
SUB mem, idata 
SUB reg, reg 
SUB reg, mem 
SUB mem, . reg 
Description 


This instruction subtracts the src operand from the dest operand and stores the 
result in dest. 


Flags 
OF DF IF TF SF AF PF CF 


Faults 
PM RM V8086 
12 #SS(O) 
13. #GP(O) INT 13 #GP(O) 
14. #PF(ec) #PF(ec) 
17. #AC(O) #AC(0) 
Example 
; 64-bit subtraction operation EDX:EAX - EBX:ECX 
SUB EAX, ECX ; Low-order bits 
SBB EDX, EBX ; High-order bits with possible borrow 
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TEST 8086/80186/80286/80386/80486 
Test Bits (8/16p/32) 
Syntax 


TEST dest, src 


Operation 
NULL « dest & src 


Legal Forms 


dest src 
TEST reg, idata 
TEST mem, idata 
TEST reg, reg 
TEST reg, mem 
TEST mem, reg 
Description 


This instruction performs a bit-by-bit AND operation on the src and dest operands 
and discards the result. The flag bits, however, are set as they would be after an 
AND instruction. 


Flags 
OF DF IF TF SF ZF AF PF CF 


Faults 
PM RM V8086 
12 ¥#SS(0) 
13. #GP(O) INT 13 #GP(O) 
14 #PF(ec) #PF(ec) 
17 #AC(O) #AC(0) 
Examples 
TEST AL, OFH ; Check if any bits set in 
; low nibble of AL 
TEST EBX, ECX : Test EBX under mask in ECX 
TEST WORD PTRLEBP+6], 8000H ; Check whether 


; 16-bit integer is negative 
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VERR 80286/80386/80486 
Verify Read Access (16) 
Syntax 


VERR select 


Operation 


if (accessible(select)) & read_access(select)) then 
ZF ¢ 1 

else 
ZF — 0 

endif 


Legal Forms 


select 
VERR reg 
VERR mem 
Description 


This instruction sets the ZF bit in EFLAGS to 1 if the current procedure can load the 
select operand into DS, ES, FS, or GS and can read a value from the memory seg- 
ment without causing a privilege violation. 


If the selector is for a descriptor that is nota memory segment, if the memory segment 
is not readable, or if the current procedure does not have a high enough privilege 
level to gain access to the segment, VERR clears ZF to 0. The VERR instruction does 
not generate a fault for referring to a selector that is invalid; however, a fault occurs if 
the instruction operand is a memory operand and the operand address is invalid. 


Note that this instruction does not check the “present” bit of the descriptor, nor does 


it check access at the page protection level (U/S and R/W bits of page table entries). 


Flags 
OF DF IF TF SF ZF 


Faults 
PM RM V8086 
6 INT 6 #UDQO 
12 #SS(O) 
13. #GP(O) 
14 #PF(ec) 
17. #AC(O) | #AC(0) 
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Example 


VERR 
JZ 
STC 
LEAVE 
RETF 
CONTINUE: 
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WORD PTR [LEBP+8] 
CONTINUE 


; Check selector on stack 

; Branch if OK 

; Set carry flag 

; And return if selector is invalid 
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VERW 80286/80386/80486 
Verify Write Access (16) 
Syntax 


VERW select 


Operation 


if (accessible(select)) & write_access(select)) then 
ZF el 

else 
ZF ¢ 0 

endif 


Legal Forms 


select 
VERW reg 
VERW mem 
Description 


This instruction sets the ZF bit in EFLAGS to 1 if the current procedure can load the 
select operand into DS, SS, ES, FS, or GS and can write a value to the memory seg- 
ment without causing a privilege violation. 


If the selector is for a descriptor that is not a memory segment, if the memory seg- 
ment is not writable, or if the current procedure does not have a high enough privi- 
lege level to gain access to the segment, VERW clears ZF to 0. The VERW 
instruction does not generate a fault for referring to a selector that is invalid; how- 
ever, a fault occurs if the instruction operand is a memory operand and the operand 
address is invalid. - 


Note that this instruction does not check the ‘present’ bit of the descriptor, nor does _ 


it check access at the page protection level (U/S and R/W bits of page table entries). 


Flags 
OF DF IF TF SF ZF 


Faults 
PM RM V8086 
6 INT 6 #UDQO 
12 #SS(O) 
13 #GP() 
14 #PF(ec) 
17. #AC(O) #AC(0) 
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Example 


VERW 
JZ 
STC 
LEAVE 
RET 
CONTINUE: 
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WORD PTR [LEBP+8] 
CONTINUE 


’ 
td 
9 


; Check selector on stack 
; Branch if OK 
> Set carry flag 


And return if selector is invalid 
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WAIT 8086/80186/80286/80386/80486 
Wait Until Not Busy O 


Syntax — 
WAIT 


Legal Form 
WAIT 


Description 


This instruction places the 80386 into an idle state until the BUSY\ pin is inactive. If 
the BUSY\ pin is inactive when the instruction executes, no idle occurs. The BUSY\ 
pin is usually connected to a numeric coprocessor. You should execute this instruc- 
tion before any 80386 instruction that will access a value stored by the coprocessor. 


If both the TS (task switched) bit in register CRO and the MP (monitor coprocessor) 
bit are set, a coprocessor fault occurs. If the ERROR\ pin of the 80386 is active, indi- 
cating an unmasked exception on the coprocessor, a math fault occurs. 


The 80486 has no BUSY\ pin because the numeric processor is integrated into the 
CPU. In the 80486, the WAIT instruction is used to force the floating-point unit to 
check for unmasked exceptions, the existence of which will cause a math fault. 


Flags 
OF DF IF TF SF ZF AF PF CF 


Faults 
PM RM V8086 
7 #€NMQ INT 7 #NMQ) 
16 #MFQ INT 16 #MF() 
Example 
FST result ; Store floating-point result 
WAIT ; Wait for coprocessor to finish 
PUSH | result ; Push the result onto the stack 


Print the value 


CALL fp_print 
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WBINVD 80486 


Write-Back and Invalidate Cache O 


Syntax 
WBINVD 


Operation 


Invalidate cache 


Legal Form 
WBINVD 


Description 


Internal to the 80486, this instruction is indentical to INVD. However, it causes a 
special “write-back” bus cycle to be issued before the external-cache-flush bus 
cycle. This allows an external cache to write back its contents to main memory. 


Flags 
OF DF IF TF SF ZF 


Faults 


None. 


Example 


WBINVD ; Invalidate and signal external write-back 
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XADD 80486 
Exchange and Add (8/16p/32) 
Syntax 


XADD dest, src 


Operation 


temp <— dest 
dest «— temp + src 
src «< temp 


Legal Forms 


dest src 
XADD 
reg, 
reg 
XADD mem, reg 
Description 


The sum of dest and src is computed and stored into dest. The original value of dest 
is stored into src. The flags are set according to the standard rules for an ADD 
instruction. 


When preceded by the LOCK prefix, this instruction is very useful for 
multiprocessor semaphore operations. 


Flags 
OF DF IF TF SF ZF AF PF CF 


Faults 
PM RM V8086 
12 #SS(0) 
13 #GP(O) INT 13 #GP(O) 
14 #PF(ec) #PF(ec) 
17. #AC(O) #AC(O) 
Example 
MOV AL,1 ; Semaphore increment value 
XADD sema,AL ; Increment 
JB failed ; Semaphore < 0 
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XCHG - 8086/80186/80286/80386/80486 
Exchange — (8/16p/32) 
Syntax 


XCHG opl, op2 


Operation 
temp <« opl 
opl < op2 

op2 < temp 
Legal Forms 


op! op2 


XCHG reg, reg 
XCHG reg, mem 
XCHG mem, reg 
Description 


This instruction swaps the contents of two operands. If either operand is a memory 
operand, the bus LOCK\ signal is held active during the read and write memory 


cycles. 
Flags 
OF DF IF TF SF ZF PF 
Faults 

PM RM V8086 
12 +#SS(0) 
13 #GP(O) INT 13 #GP(0) 
14 #PF(ec) #PF(ec) 
17. #AC(O) #AC(O) 
Examples 
XCHG EAX, ECX ; Swap EAX and ECX 


XCHG AL, [LESI+10] ; Exchange AL with memory 
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XLATB 8086/80186/80286/80386/80486 
Translate Byte ‘@) 
Syntax 
XLATB 
Operation 


AL <— DS:[EBX+AL] 


Legal Form 
XLATB 


Description 
This instruction uses the value of AL as a positive index into a table located at 
DS:EBX. It then stores the indexed table byte in AL, replacing the original value. 


You can apply a segment override prefix to XLATB so that the table access location 
will be at EBX + AL in the specified segment. 


Flags 
OF DF IF TF SF ZF 


Faults 
PM RM V8086 
12 #SS(0) 
13. #GP(O) INT 13 #GP(O) 
14. #PF(ec) #PF(ec) 
Example 
LEA EBX, A2E_TAB ; Load offset of ASCII to EBCDIC table 
LDS ESI, SRC ; Load source string pointer 
LES EDI, DEST_BUFF ; Load destination string pointer 
CLD ; Set DF = 0 | 
L1: LODSB ; Get byte of source string 
CS: ; Assume translate table resides in CS 
XLATB | ; Translate byte 
STOSB ; Store resulting character 
OR AL, AL ; Test for NUL character 
JNZ L1 ; Loop if not NUL 
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XOR 


Boolean Exclusive OR 


8086/80186/80286/80386/80486 
(8/16p/32) 


Syntax 
XOR dest, src 


Operation 


dest < dest * src 


Legal Forms 


dest Src 
XOR reg, idata 
XOR mem, idata 
XOR reg, reg 
XOR reg, mem 
XOR mem, reg 
Description 


This instruction performs a bit-by-bit exclusive OR operation on the src and dest — 
operands, storing the result in the dest operand. The XOR operation is defined as 


follows: 

0OA0=0 
OA1=1 
1LAQ=1 
1A1=0 


Flags 
OF DF IF TF SF ZF 


12 ¥#SS(O) 

13. #GP(O) INT 13 
14 #PF(ec) 

17 #AC(O) 


Examples 


XOR AL, OFFH 
XOR EBX, ECX 
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V8086 


#GP(O) 
#PF(ec) 
#AC(0) 


; Change Os to 1s and vice versa in AL 
; Compute EBX < EBX * ECX 


8: Reference Section 


Floating-Point Instruction Set 


The floating-point instruction set adds support for arithmetic functions using real 
numbers. The 80386 cannot directly execute floating-point instructions. However, 
when coupled with the 80387 numeric coprocessor, the instruction set is extended 
to include the instructions that are described on the following pages. The 80486 
requires no coprocessor, and it can directly execute any instruction marked for 
the 80387. 


PROCESSOR TYPE 
Processors that support Se Gaterence encten 
the instruction. 
FICOM 8087/80287/80387 
Integer Compare 
MNEMONIC 
Used by the assembler to egal Fors 
. 7 FICOM meml6 s compare (ST, mem16) 
represent the instruction. FICOM mem32 ; compare (ST, mem32) 
FICOMP meml6 3: compare (ST, meml6); pop); 
FICOMP mem32 ¢; compare (ST, mem32); pop(); 
NAME es 
; A Descript 
Name of instruction. The two's complement integer is converted to temp real format and compared with 
the top of stack. If the opcode is FICOMP, the stack is popped after the comparison. 
LEGAL FORMS The condition codes are set in the same manner as those for FCOM. 
Tegal fe Oorms of the Exceptions 
instruction. SF_PE UE OE ZE DE IE 
DESCRIPTION 
eke Example 
Description of the 
HES PUTO TOCn 
memory operand. fe se eee oreeniee ceed 
tee met aaa al ee 
sé | 60 ee eee 
EXCEPTIONS STQD| __13792.29731_ ST |___13792.29731___| 
bined? 2 anne wes eee 
An “x” in a box aaa es 
indicates that the ———————— ee 
specified exception may oo 
[1] of - fo] 
be generated for the : Seg 
; : 6699 em inter is integer 0. 
instruction. A “-” ina a 
box indicates that the 


specified exception is not 

possible. SF = Stack fault. 

PE = Precision exception. 

UE = Underflow 

exception. OE = 

Overflow exception. ZE = 

Zero divide exception. 

DE = Denormal | 

exception. IE = Invalid EXAMPLE 


operation exception. Each example shows the 


80387 stack before and 
after execution of the 
instruction. 
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FABS 8087/80287/80387 
Absolute Value 


Legal Form 
FABS ; If (ST < 0) then ST € ST * -l 


Description 


This instruction replaces the original value of the element at the top of stack with its 
absolute value. 


Exceptions 
SF PE UE OE ZE DE IE 


ls bee 


Example 


Before After 


ST ST Dak 


FABS 
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FADD 8087/80287/80387 
Addition 


Legal Forms 


FADD ; ST(1) — ST + ST(1); popd); 
FADD mem32 ; ST <— ST + mem32 

FADD mem64 ; ST — ST + mem64 

FADD ST(n) ; ST < ST + ST(n) 

FADD ST, ST(n) ; ST < ST + ST(n) 

FADD ST(n), ST ; ST(n) <— ST(n) + ST 

FADDP ST, ST(n) ; ST — ST + ST(n); pop(); 
FADDP ST(n), ST ; ST(n) — ST(n) + ST; pop(); 
Description 


This instruction adds the specified floating-point operands and optionally pops the 
top of stack. 


If you specify a memory operand, it is converted to temp real (80-bit) format before 
it is added to the top of stack. 


If you add a floating-point value to infinity, the result is the original infinity. If you 
add two infinities, they must have the same sign, and the result is the same infinity. 


Exceptions | 
SF PE UE OE ZE DE IE 


Examples 


Before 


FADD 
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FADD ST (2), ST 
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FBLD 8087/80287/80387 
BCD Load 


Legal Form 
FBLD mem80 ; push(float(mem80) ) 


Description 


This instruction converts an 80-bit, 19-digit BCD integer to a temp real and pushes it 
onto the stack. If the memory operand is not a valid BCD integer, an undefined 
value is pushed onto the stack. 


Exceptions 
SF PE UE OE ZE DE IE 


Saoakhne 


Example 
Before After 


ST 


ST ST (1) 


FBLD [ESI] 
ESI points to 17 BCD. 
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FBSTP 8087/80287/80387 
BCD Store and Pop 


Legal Form 
FBSTP mem80 ; mem80 < BCD(ST); pop(); 


Description 
This instruction rounds the top of stack to an integer, stores in memory in BCD for- 
mat, and then pops the stack. 


Unlike most arithmetic operations, FBSTP signals the invalid (1) exception if either 
operand is a quiet NaN. 


Exceptions 
SF PE UE OE ZE DE IE 


After 


ST -71.6 


FBSTP [0A2H] 


BCD 3 is stored in memory. 
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FCHS 8087/80287/80387 
Change Sign 


Legal Form 
FCHS ; ST & ST * -l 


Description 
This instruction complements the sign bit of the top of stack. 


Exceptions 
SF PE UE OE ZE DE IE 


Dias oa eee Cd el 


Example 
Before After 
ST 1023.99 ST -1023.99 
ST (1) . ST (1) 


FCHS 
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FCLEX | 8087/80287/80387 


Clear Exceptions 


Legal Forms 


FCLEX ; SW <— SW & O7FOOH 
FNCLEX ; SW <— SW & O7FOOH 
Description 


This instruction clears the exception flags in the status word and the busy bit to 0. 
The FCLEX form of the instruction checks for unmasked exceptions from previous 
operations before clearing the status word. The FNCLEX form clears the SW bit 
without checking. 


Exceptions 
SF PE UE OE ZE DE IE 
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FCOM 8087/80287/80387 


Compare 


Legal Forms 


FCOM ; compare ST, ST(1) 

FCOM mem32 ; compare (ST, mem32) 

FCOM mem64 ; compare (ST, mem64) 

FCOM ST(n) ; compare (ST, ST(n)) 

FCOMP mem32 ; compare (ST, mem32); pop(); 

FCOMP mem64 ; compare (ST, mem64); pop(); 

FCOMP ST(n) ; compare (ST, ST(n)); pop(); 

FCOMPP ; compare (ST, ST(1)); pop(); pop(); 
Description 


This instruction performs the function compare (op1, op2) and sets the numeric 
condition code according to the result of the comparison. The floating-point stack 
is optionally popped once or twice. 


The following table shows the condition code settings that result from the compare 
function. FCOM considers +0.0 and —0.0 to be equal. 


Condition C3 C2 C1 Co 
op1 > op2 0 0 - 0 
op1 < op2 0 0 : 1 
op1 = op2 1 0 - 0 
either op isa NaN 1 1 - 1 


The numeric condition codes are arranged in the status word so that C3, C2, and CO 
map into the same bit positions as the ZF, PF, and CF bits of the EFLAGS register. 
Thus, issuing the following instructions sets the EFLAGS register as if the compare 
had been performed on the integer values. | 


FCOM op ; Floating point compare 
FSTSW AX ; Store status word to AX 
SAHF ; Store AH into flags 


You can then use any conditional jump instruction (JE, JNE, JA, JAE, JB, or JBE) 
to branch on the result of the compare. You can use JP to test for NaN operands. 


Unlike most arithmetic operations, FCOM signals the invalid (1) exception if either 
operand is a quiet NaN. 


Exceptions 
SF PE UE OE ZE DE IE 


fi bk bor beet el 
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Examples 


_ Before 


FCOM ST (2) 


FCOMPP 
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FCOS 80387 


Cosine 


Legal Form 
FCOS ; ST €— cos(ST) 


Description 
This instruction computes the cosine of the value in radians at the top of stack and 
replaces ST with cosine. 


The operand processed by FCOS must be a value between + 2° or the instruction 
does not execute and condition code C2 is set to 1. C2 is cleared to 0 if the instruc- 
tion is executed. 


Exceptions 
SF PE UE OE ZE DE IE 


BBB 


After 


ST 
ST C1) 


FCOS 
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FDECSTP 8087/80287/80387 
Decrement Stack Pointer 

Legal Form 

FDECSTP > TOP — (TOP - 1) & O7H 

Description 


This instruction allows you to manipulate the floating-point stack pointer. Issuing 
FDECSTP is equivalent to pushing a new value onto the stack, but no value is sup- 
plied. The tag registers are not modified. 


Exceptions 
SF PE UE OE ZE DE IE 


Example 


Before 


FDECSTP 
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FDIV 8087/80287/80387 
Division 


Legal Forms 


FDIV ST(1) — ST(1) / ST; pop); 
FDIV mem32 ST — ST / mem32 

FDIV mem64 ST <— ST / mem64 

FDIV ST(n) ST <— ST / ST(n) 

FDIV ST, ST(n) ST — ST / ST(n) 

FDIV ST(n), ST ST(n) €— ST(n) / ST 

FDIVP ST, ST(n) ST < ST / ST(n); pop); 
FDIVP ST(n), ST ST(n) <— ST(n) / ST; pop (); 
Description 


This instruction executes a divide operation with the above operands. If you 
specify a memory operand, it is converted to temp real (80-bit) format before the 
division is performed. A stack pop operation is performed if specified by the 
opcode. 


Division by infinity results in 0. Infinity divided by a real number results in infinity. 
Infinity divided by infinity is not a valid operation. 


Exceptions 
SF PE UE OE ZE DE IE 


Examples 


Before After 


FDIV 
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Before After 


ST ST 
ST (1) ST (1) 
ST (2) ST (2) 


FDIV ST(2), ST 
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FDIVR 8087/80287/80387 


Division Reversed 


Legal Forms 


FDIVR ST(1) © ST / ST(1); pop); 
FDIVR mem32 ST — mem32 / ST 

FDIVR mem64 ST «© mem64 / ST 

FDIVR ST(n) ST — ST(n) / ST 

FDIVR ST, ST(n) ST — ST(n) / ST 

FDIVR ST(n), ST ST(n) — ST / ST(n) 

FDIVRP ST, ST(n) ST — ST(n) / ST; pop(); 
FDIVRP ST(n), ST ST(n) — ST / ST(n); pop (); 
Description 


This instruction executes a divide operation with the above operands. This instruc- 
tion is equivalent to FDIV, but the divisor and dividend operands are exchanged. If 
you specify a memory operand, it is converted to temp real (80-bit) format before 
the division is performed. A stack pop operation is performed if specified by the 
opcode. 


Division by infinity results in 0. Infinity divided by a real number results in infinity. 
Infinity divided by infinity is not a valid operation. 


Exceptions 
SF PE UE OE ZE DE IE 


Examples 


Before 


FDIVR 
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Before After 


FDIVR ST(2), ST 


344 
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Free NDP Register 
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8087/80287/80387 


Legal Form 
FFREE 


Description 


This instruction marks the specified stack element as unused by setting the tag 


ST(n) ; TW(n) — UNUSED 


word for the corresponding floating-point register. The stack pointer is not modi- 
fied, nor is the actual content of the NDP register. 


Exceptions 
SF PE UE OE ZE DE IE 


Example 
Before 
ST 190000.3 
ST (1) 
ST (2) 


After 
190000.3 
0.001 


ST 
ST (1) 
ST (2) 


FFREE ST(1) 
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FIADD | 8087/80287/80387 
Integer Addition 


Legal Forms 


FIADD mem16 ; ST < ST + float(meml16) 
FIADD mem32 - ST — ST + float(mem32) 
Description 


This instruction converts the two’s complement integer at the specified address to 
temp real format and adds it to the top of stack. Other than the difference in 
operand type, this instruction is equivalent to FADD. 


Exceptions 
SF PE UE OE ZE DE IE 


Example 


Before © 


FIADD WORD PTR [ECX] 


ECX points to integer -2. 
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FICOM 8087/80287/80387 


Integer Compare 


Legal Forms 


FICOM mem16 ; compare (ST, mem16) 

FICOM mem32 ; compare (ST, mem32) 

FICOMP meml6 ; compare (ST, meml6); pop(); 
FICOMP mem32 ; compare (ST, mem32); pop(); 
Description 


The two’s complement integer is converted to temp real format and compared with 
the top of stack. If the opcode is FICOMP, the stack is popped after the comparison. 


The condition codes are set in the same manner as those for FCOM. 


Exceptions 
SF PE UE OE ZE DE IE 


After 


13792.29731 


ST 


FICOMP WORD PTR [OFC6H] 


Memory pointer is integer 6. 
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FIDIV 8087/80287/80387 
Integer Division 


Legal Forms 


FIDIV mem16 ; ST < ST / real(meml6) 
FIDIV mem32 ; ST — ST / real (mem32) 
Description 


This instruction fetches the two’s complement integer from memory, converts it to 
temp real format, and uses it as a divisor of the top of stack. The results generated 
by this instruction are the same as those generated by the FDIV instruction. 


Exceptions 
SF PE UE OE ZE DE IE 


Example 


Before | After 


ST 1.0 ST 


ST (1) ST () 
ees 
ee: 
eee 


FIDIV DWORD PTR [EBP+16] 


Memory pointer is integer -4. 
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FIDIVR 8087/80287/80387 


Integer Division Reversed 


Legal Forms 


FIDIVR meml6 ; ST & real(meml6) / ST 
FIDIVR mem32 ; ST <— real(mem32) / ST 
Description 


This instruction converts the two’s complement integer at the specified memory 
location to temp real format and divides it by the top of stack. The results generated 
by this instruction are the same as those generated by the FDIVR instruction. 


Exceptions 
SF PE UE OE ZE DE IE 


Example 


Before 


ST (1) 22 


ST 


FIDIVR DWORD PTR [EBP+16] 


Memory pointer is integer -4. 
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FILD 8087/80287/80387 
Integer Load 


Legal Forms 


FILD mem16 ; push (float (mem16) ) 
FILD mem32 ; push (float (mem32)) 
FILD mem64 ; push (float (mem64) ) 
Description 


This instruction converts a two’s complement integer to temp real format and 
pushes the value onto the 80387 stack. 


Exceptions 
SF PE UE OE ZE DE IE 


Ea Gs Sa 


Example 
Before 
De ee ee ST | 
ST 1.209 ST (1) 1.209 


FILD QWORD PTR [EAX] 


Memory pointer is integer 666. 
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FIMUL 8087/80287/80387 
Integer Multiplication 


Legal Forms 


FIMUL mem16 ; ST — ST * real(meml16) 
FIMUL mem32 ; ST & ST * real (mem32) 
Description 


This instruction converts the two’s complement integer at the specified memory 
location to temp real format and multiplies it by the top of stack. The results of this 
instruction are identical to those obtained by FMUL. 


Exceptions 
SF PE UE OE ZE DE IE 


Paco Gaba Laee 


FIMUL WORD PTR [ESI+EAX] 


Memory pointer is integer -4. 
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FINCSTP | 8087/80287/80387 


Increment Stack Pointer 


Legal Forms 
FINCSTP ; TOP < (TOP + 1) & O7H 


Description 
This instruction increments the TOP field in the floating-point status word. The 


contents of the floating-point register previously at the top of stack and the regis- 
ter’s associated tag word are not affected. 


Exceptions 
SF PE UE OE ZE DE IE 


Example 


After 


ST (7) 
ST 


FINCSTP 


352 


8: Reference Section 


FINIT 8087/80287/80387 
Initialize NDP 


Legal Forms 


FINIT ; CW < 037FH; SW <— SW & 4700H; TW < OFFFFH 
FNINIT ; CW < 037FH; SW <— SW & 4700H; TW < OFFFFH 
Description 


This instruction sets the FPU state to its default value. All registers are marked 
unused, all exceptions are masked, rounding control is set to nearest, and the 
operating mode is set to double-precision. 


The FINIT instruction tests for any unmasked exception before clearing the NDP 
state, unlike FNINIT, which does not. Consequently, the first floating-point instruc- 
tion of an application should be FNINIT. : 


Exceptions 
SF PE UE OE ZE DE IE 
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FIST 8087/80287/80387 


Integer Store 


Legal Forms 


FIST mem16 ; meml6 <— int(ST) 
FIST mem32 ; mem32 <— int(ST) 
FISTP mem16 ; meml16 < int(ST); pop (); 
FISTP mem32 ; mem32 <— int(ST); pop(); 
FISTP mem64 ; mem64 — int(ST); pop(); 


Description 


This instruction rounds the current top of stack to an integer according to the con- 
trol bits and stores the value in the specified operand. If the opcode is FISTP, the 
stack is popped after the store operation. Note that the sign of a floating-point 0 is 
lost upon conversion to the two’s complement integer format. 


Two differences exist between FIST and FISTP. The FISTP instruction, which pops 
the stack after the store operation, can store a 64-bit integer; FIST cannot. The FIST 
instruction generates an invalid operation exception if the top of stack is a quiet 
NaN; FISTP does not. 


Exceptions 
SF PE UE OE ZE DE IE 


FIST DWORD PTR [EBP+42] 


Integer 32 stored into memory. 
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FISUB 8087/80287/80387 


Integer Subtraction 


Legal Forms 


FISUB mem16 > ST — ST - real(meml6) 
FISUB mem32 ; ST < ST - real(mem32) 
Description 


This instruction converts the two’s complement integer at the specified memory 
location to temp real format and subtracts it from the top of stack. The results of this 
instruction are identical to those obtained by FSUB. 


Exceptions 
SF PE UE OE ZE DE IE 


EJEIESEA Ea 
Example 


Before After 


ST 
ST (1) 


FISUB WORD PTR [A72H] 


Memory pointer is integer 3. 


355 


MICROSOFT’S 80386/80486 PROGRAMMING GUIDE 


FISUBR 8087/80287/80387 


Integer Subtraction Reversed 


Legal Forms 


FISUBR meml6 ; ST — real(meml6) - ST 
FISUBR mem32 ; ST — real(mem32) - ST 
Description 


This instruction converts the two’s complement integer at the specified memory 
location to temp real format and subtracts the top of stack from it. The results of this 
instruction are identical to those obtained by FSUBR. 


Exceptions 
SF PE UE OE ZE DE IE 


Fahd CERES 


After 


ST (1) 


FISUBR WORD PTR [A72H] 


Memory pointer is integer 3. 
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FLD 8087/80287/80387 
Load Real 


Legal Forms 


FLD mem32 ; push(mem32 ) 
FLD mem64 ; push(mem64 ) 
FLD mem80 ; push(mem80) 
FLD ST(n) 3 push (ST(n)) 
Description 


This instruction pushes a copy of the specified operand onto the floating-point 
stack. If you specify a 32-bit or 64-bit floating-point memory operand, it is con- 
verted to temp real format before being stored. 


If the operand is a single- or double-precision value, the FPU might generate a 
denormal exception. A denormal exception is not generated by a value already in 
temp real format. 


Exceptions 
SF PE UE OE ZE DE IE 


Before 


FLD DWORD PTR [EDX] 


Memory pointer is short real 6.1. 
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FLDconst 


Load Constant 


8087/80287/80387 


Legal Forms 


FLD1 
FLDL2E 
FLDL2T 
FLDLG2 
FLDLN2 
FLDPI 
FLDZ 


Description 


push(1.0) 
push(log2(e)) 
push(1o0g2(10) ) 
push(1o0g10(2)) 
push(In(2)) 
push(PI) 
push(+0.0) 


This instruction pushes the constant value specified by the opcode onto the stack. 
The function In stands for log base e. 


Exceptions 


SF PE UE OE ZE DE IE 


a a Sa Pao a 


Example 
Before After 
poe ST 5141596. 
ST ST (1) 
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FLDCW 8087/80287/80387 


Load Control Word 


Legal Form 
FLDCW mem16 ; CW — meml6 


Description 
This instruction loads a new value for the control word from memory. FLDCW can 
unmask previously masked exceptions, triggering an unmasked exception. 


Exceptions 
SF PE UE OE ZE DE IE 
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FLDENV | 8087/80287/80387 


Load Environment 


Legal Form 
FLDENV memp ; NDP <— memp 


Description 


This instruction loads the 28-byte block pointed to by memp into the environment 
registers of the FPU. The memory operand contains a new control word, status 
word, tag word, and error block. The memory format for the environment is shown 
in Figure 8-1. 


31 16 15 0 Byte offset 
0 
4 
8 
Crore) «di 
16 
20 
Reserved 24 

. 32-bit format 

45 : O Byte offset 
0 
2 
4 
6 
WPisaof Ss 8 
Operand pointer) 15 | 10 
OP af sd 12 


16-bit format 


Figure 8-1. Floating-point environment. 
Loading a new status word and control word can cause an unmasked exception. 


Exceptions 
SF PE UE OE ZE DE IE 


Elio Rats LaGake 
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8087/80287/80387 


Legal Forms 


FMUL 

FMUL mem32 
FMUL mem64 
FMUL ST(n) 
FMUL ST, ST(n) 
FMUL ST(n), ST 
FMULP ST, ST(n) 
FMULP ST(n), ST 
Description 


; ST(1) — ST(1) * ST; pop); 
; ST & ST * mem32 

; ST — ST * mem64 

; ST < ST * ST(n) 

> ST & ST * ST(n) 

; ST(n) — ST(n) * ST 

; ST — ST * ST(n); pop); 

; ST(n) — ST(n) * ST; pop); 


This instruction multiplies the specified operands and stores them as indicated 
above. If you specify 32-bit or 64-bit memory operands, they are converted to temp 
real format before the multiplication takes place. If the opcode specifies, the stack is 


popped after the operation. 


Multiplying any value other than 0 by infinity results in infinity. Multiplying 0 by in- 


finity is an invalid operation. 


Exceptions 
SF PE UE OE ZE DE IE 


FMUL 
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Before 


FMUL ST(1) 
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FNOP 8087/80287/80387 
No Operation 


Legal Form 
FNOP 


Description 
FNOP is an alias for the FST ST, ST instruction. It does nothing. 


Exceptions 
SF PE UE OE ZE DE IE 


FNOP 
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FPATAN 8087/80287/80387 


Partial Arctangent 


Legal Form 
FPATAN ; ST(1) <— atan(ST(1) / ST); pop(); 


Description 


This instruction computes the arctangent in radians of ST) + ST. The mnemonic 
“partial arctangent” is inherited from earlier NDPs, which placed restrictions on the 
values of ST and ST(1). These values are not restricted on the 80387 or 80486. 


Exceptions 
SF PE UE OE ZE DE IE 


BOOB RIT 


Before After 


ST (1) 1.0 ST (1) 


FPATAN 
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FPREM 8087/80287/80387 
Partial Remainder 


Legal Form 
FPREM ; ST — remainder (ST / ST(1)) 


Description 


This instruction uses repeated subtractions to compute the remainder of ST divided 
by STC). Because this operation could require a large number of iterations (during 
which time the NDP would be inaccessible), the instruction halts after producing 

a partial remainder. The value in ST is reduced by a factor of up to 2% in a single 
iteration. 


If the remainder is a partial value (that is, the operation does not complete), the C2 
status bit is set to 1. If the remainder is less than the value of ST(), the operation is 
complete and bit C2 is cleared to 0. By testing the value of C2, the FPREM instruc- 
tion may be executed repeatedly until the remainder operation yields an exact 
result. Additionally, when the instruction is complete (C2 = 0), the three least signifi- 
cant bits of the quotient of ST + ST) can be computed by the following formula: 


Q=COx4+C3x2+Cl1 
where CO, C1, and C3 are the remaining status bits. 


The FPREM instruction reduces operands for the transcendental functions to legal 
values. For example, the operand to F2XM1 must be —1 < ST < 1. FPREM produces an 
exact result, and the precision control and rounding control bits are ignored during 
execution. 


The FPREM1 instruction produces the JEEE-754 standard partial remainder value, 
which may be different from FPREM when there are two integers equally close to 
ST + ST). FPREM rounds toward 0, and FPREM1 chooses the even value. 


Exceptions 
SF PE UE OE ZE DE IE 
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Example 

Before 
ST 
ST (1) 
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After 


ST (1) 


FPREM C2Z=0 


8: Reference Section 


FPREM1 80387 


IEEE Partial Remainder 


Legal Form 
FPREM1 ; ST © remainder (ST + ST(1)) 


Description 

This instruction uses repeated subtractions to compute the remainder of ST divided 
by ST(). Because this operation could require a large number of iterations (during 
which time the NDP would be inaccessible), the instruction halts after producing 

a partial remainder. The value in ST is reduced by a factor of up to 2% in a single 
iteration. 


If the remainder is a partial value (that is, the operation is not complete), the C2 
status bit is set to 1. If the remainder is less than the value of ST(1), the operation is 
complete and bit C2 is cleared to 0. By testing the value of C2, the FPREM1 instruc- 
tion may be executed repeatedly until the remainder operations yield an exact 
result. Additionally, when the instruction is complete (C2 = 0), the three least signifi- 
cant bits of the quotient of ST + ST) can be computed by the following formula: 


Q=COx4+C3x2+Cl1 
where CO, C1, and C3 are the remaining status bits. 


The FPREM1 instruction reduces operands for the transcendental functions of the 
80387 to legal values. For example, the operand to F2XM1 must be -1 < ST <1. 
FPREM1 always produces an exact result, and the precision control and rounding 
control bits are ignored during execution. 


The FPREM1 instruction produces the JEEE-754 standard partial remainder value, 
which may be different from FPREM when there are two integers equally close to 
ST + ST(). FPREM always rounds toward 0, and FPREM1 always chooses the even 
value. 


Exceptions 
SF PE UE OE ZE DE IE 


Ed Es Re Ea EES 
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Before 


ST (1) 4.0 


ST 


368 


After 


FPREM1 C2=0 


8: Reference Section 


FPTAN 8087/80287/80387 


Partial Tangent 


Legal Form 
FPTAN ; ST © tan(ST); push(1.0); 


Description 
This instruction computes the tangent of the top of stack and arranges the NDP 
stack such that: 


ote tan (original ST) 
The denominator is always 1.0 after the FPTAN instruction. 


The operand value must be a positive number that is expressed in radians less than 
PI x 262, or no operation takes place and the C2 condition code bit is set to 1. If the 
input operand is legal, C2 is cleared to 0. 


Exceptions 
SF PE UE OE ZE DE IE 


FPTAN 
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FRNDINT 8087/80287/80387 


Round to Integer 


Legal Form 
FRNDINT ; ST < int(ST) 


Description | 

This instruction rounds the value at the top of stack to an integer based on the set- 
tings of the round control (RC) field in the control word. See Chapter 2 for a discus- 
sion of the NDP rounding modes. 


Exceptions 
SF PE UE OE ZE DE IE 


After 


ST (1) 


FRNDINT 
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FRSTOR 8087/80287/80387 


Restore NDP State 


Legal Form 
FRSTOR memp ; NDP «— memp 


Description 


This instruction loads the entire floating-point processor state from the 108-byte 
block of data beginning at memp. Use the FSAVE instruction to store the NDP state. 
Figure 8-2 shows the format of the state block. 


1 0 Byte offset 


5 


Tag word 
31 Cr 


16-bit format (real & V86 modes) 


Environment 
portion 


Register 
portion 


Figure 8-2. NDP state. (continued) 
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FIGURE 8-2. continued 
31 16 15 Byte offset 


0 
|___Reserved | Conttrol word 


[S”~SSTO SC~diC 
Register [SSO g@ S~Sd CO 
porion| ST i STOuy +d 36 

=. ie) 

STD gy S*d:CS 

STD S—~dSCK 

Se gd) 

Gus [| Si@an | 56 

CSG SC*d'C 

[ST ip wp S~*dSC 

STG —~=~*diC 

Re ee IOS 

Shs | Sa | 

OSG S*dSC 

STs wy dC 

= Oya 

CSTs gs dD 

sie | STOnm | 96 

a *di00 

[ST Oig yd 104 


32-bit format 


Environment 
portion 


New unmasked exceptions might be triggered because a new status word and con- 
trol word are loaded. 


Exceptions 
SF PE UE OE ZE DE IE 


HR Dike 
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FSAVE 


Save NDP State 


8: Reference Section 


8087/80287/80387 


Legal Forms 


FSAVE memp 
FNSAVE memp 


Description 


This instruction stores the complete processor state of the floating-point unit in 
memory beginning at location memp. Figure 8-3 shows the format of the state block. 


Environment 
portion 


Register 


portion 


Figure 8-3. NDP Ssiate. 


31 


ST(M)46, .47 

ST(2)64. 79 
ST(5)16. 47 

| 


16-bit format (real & V86 modes) 


; memp <— NDP 
; memp <— NDP 


15 


Instruction pointer. 15 


Pisagl 


Operand pointero, 15 


ee 


OP if 


0 Byte offset 


(continued) 
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Figure 8-3. continued 
31 16 15 0 Byte offset 


__ Data selector 


32-bit format 


Environment 
portion 


Register 
portion 


After the FSAVE is completed, the NDP state is set to the initialized state, as if an 
FNINIT instruction had been executed. 


The FSAVE form of the instruction tests for any unmasked exceptions before exe- 
cuting the save, while FSAVE does not. If you use FSAVE, pending exceptions are re- 
instated when the state block is loaded by an FRSTOR instruction. FSAVE is not 
executed until previous floating-point instructions complete. 


Exceptions 
SF PE UE OE ZE DE IE 
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FSCALE 8087/80287/80387 


Scale by 2” 


Legal Form 
FSCALE > ST & ST * 2int(sT(1)) 


Description 

This instruction scales the top of stack value by the power of 2 in ST(D. If the value 
in ST) is not an integer, it is “chopped” before being used as an exponent. Chop- 
ping generates the nearest integer smaller than the original value. 


The NDP does not perform a multiply operation, but it uses the identity (x x 2") (1.0 
x 2m) = x x 29*m and adds the integral portion of ST() to the exponent of ST. 


Exceptions 
SF PE UE OE ZE DE IE 


FSCALE 
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FSETPM 80287/80387 


Set Protected Mode 


Legal Form 
FSETPM 


Description 

This instruction performs no operation on the 80387 or 80486. It is required on the 
80287 to signal that the CPU is entering protected mode and is supported for com- 
patibility only. 


Exceptions 
SF PE UE OE ZE DE IE 
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FSIN 80387 


Sine 


Legal Form 
FSIN ; ST € sin(ST); 


Description 
This instruction computes the sine of the top of stack and stores the result in ST. 
The value in ST is assumed to be in radians. 


The input operand to FSIN must be a value such that |ST | < 26, or no operation 
takes place and the C2 condition code is set to 1. If the operand is a legal value, C2 
is cleared to 0. 


Exceptions 
SF PE UE OE ZE DE IE 


RAR Es md Ra caie 


Before . After 
3.14159... 


ST 
ST (1) 88.6 


FSIN 
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FSINCOS 80387 


Sine and Cosine 


Legal Form 


FSINCOS ; temp < ST; ST < sin(temp) 
; push(cos(temp) ) 


Description 
This instruction computes both the sine and cosine of the top of stack, although the 


values might be less precise than those generated by FSIN and FCOS. The value in 
ST is assumed to be in radians. 


The input operand to FSINCOS must be a value such that | ST | < 26 or no opera- 
tion takes place and the C2 condition code is set to 1. If the operand is a legal value, 
C2 is cleared to 0, the top of stack is the cosine value, and ST() contains the sine. 


Exceptions 
SF PE UE OE ZE DE IE 


Ea Nc Gs Gell Be cs 


Before 
3.14159... 


FSINCOS 
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FSQRT 8087/80287/80387 


Square Root 


Legal Form 
FSQRT ; ST & saqrt(ST) 


Description 

This instruction replaces the top of stack with the square root of the original value. 
Taking the square root of a negative value results in an invalid operation, except 
that the square root of negative zero (—0.0) is defined as —0.0. The square root of 
infinity (positive) is defined to be infinity. 


Exceptions 
SF PE UE OE ZE DE IE 


After 


ST 
ST (1) 


-FSQRT 


379 


MICROSOFT’S 80386/80486 PROGRAMMING GUIDE 


FST 8087/80287/80387 
Store Floating Point 


Legal Forms 


FST mem32 ; mem32 <— ST 

FST mem64 ; mem64 < ST 

FST ST(n) ; ST(n) <— ST 

FSTP mem32 ; mem32 <— ST; pop); 
FSTP mem64 ; mem64 < ST; pop(); 
FSTP mem80 ; mem80 < ST; pop(); 
FSTP ST(n) ; ST(n) — ST; pop(); 
Description 


This instruction stores the top of stack in the designated destination. If the opcode 
is FSTP, the stack top is popped (discarded) after the store operation. If the destina- 
tion is a 32-bit or 64-bit real memory operand, the top of stack is rounded according 
to the rounding control (RC) bits of the control word. 


Note that the FSTP form of this instruction can store a temp real (80-bit) value, 
while the FST form cannot. 


Exceptions 
SF PE UE OE ZE DE IE 


Before 


FST QWORD PTR [ESI] 


Memory pointer is long real 69.0. 
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FSTCW 8087/80287/80387 


Store Control Word 


Legal Forms 


FSTCW mem16 ; meml6 <— CW 
FNSTCW meml6 ; meml6 «< CW 
Description 


This instruction stores the contents of the control word (CW) register in memory. 
The FSTCW form of the instruction checks for unmasked exceptions before the 
control word is stored, while FNSTCW does not. 


Exceptions 
SF PE UE OE ZE DE IE 
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FSTENV 8087/80287/80387 


Store Environment 


Legal Forms 


FSTENV memp memp < env(NDP) 
FNSTENV memp memp < env(NDP) 


Description 


This instruction stores the contents of the floating-point environment registers (CW, 
SW, TW, and error pointers) in memory beginning at memp. Figure 8-4 outlines the 
format of the 28-byte environment block. 


31 16 15 QO Byte offset 


32-bit format 


Nm CO AR © 


1 
16 
20 
24 


Byte offset 


Status word 


Instruction pointerg | 15 
Pisa] 


Operand pointer. 15 


oo. 


15 0 


16-bit format 


Figure 8-4. NDP environment. 


The FSTENV form of the instruction checks for unmasked exceptions before the 
environment is stored, while FNSTENV does not. If unmasked exceptions are pend- 
ing before FNSTENV is executed, they are reactivated if the environment block is 
loaded with FLDENV. 


Exceptions 
SF PE UE OE ZE DE IE 
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FSTSW 8087/80287/80387 


Store Status Word 


Legal Forms 


FSTSW AX AX <— SW 
FSTSW mem16 meml6 <— SW 
FNSTSW AX AX «— SW 


FNSTSW meml6 meml16 < SW 


Description 


This instruction stores the contents of the NDP status word in memory or in the AX 
register. The FSTSW form of the instruction checks for unmasked exceptions before 
the control word is stored, while FNSTSW does not. 


Exceptions 
SF PE UE OE ZE DE IE 
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FSUB 8087/80287/80387 


Subtraction 


Legal Forms 


FSUB ; ST(1) — ST - ST(1); pop); 
FSUB mem32 ; ST — ST - mem32 

FSUB mem64 ; ST <— ST - mem64 

FSUB ST(n) ; ST < ST - ST(n) 

FSUB ST, ST(n) ; ST <— ST - ST(n) 

FSUB ST(n), ST ; ST(n) — ST(n) - ST 

FSUBP ST, ST(n) ; ST — ST - ST(n); pop(); 
FSUBP ST(n), ST ; ST(n) — ST(n) - ST; pop(); 
Description 


This instruction subtracts the specified operands and stores the result on the stack 
as shown above. Optionally, the top-of-stack is also popped. 


If you specify a 32-bit or 64-bit real memory operand, it is converted to temp real 
format before it is subtracted from ST. 


If any real value is subtracted from infinity or infinity is subtracted from any real 
value, the result is infinity. Subtracting two infinities of the same sign is an invalid 
operation. 


Exceptions 
SF PE UE OE ZE DE IE 


Fed ad od eal ea 


Before 


FSUB 
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FSUB DWORD PTR [ESI+4] 


Memory pointer is short real 2.2. 
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FSUBR | 8087/80287/80387 


Subtraction Reversed 


Legal Forms 


FSUBR ; ST(1) — ST(1) - ST; pop(); 
FSUBR mem32 ; ST — mem32 - ST 

FSUBR mem64 ; ST — mem64 - ST 

FSUBR ST(n) ; ST — ST(n) - ST 

FSUBR ST, ST(n) ; ST & ST(n) - ST 

FSUBR ST(n), ST ; ST(n) <— ST - ST(n) 

FSUBRP — ST, ST(n) =; ST & ST(n) - ST; pop); 
FSUBRP ST(n), ST ; ST(n) — ST - ST(n); pop(); 
Description 


This instruction subtracts the specified operands and stores the result on the stack 
as shown above. This instruction is equivalent to FSUB except that the subtrahend 
and minuend are exchanged. Optionally, the top of stack is also popped. 


If you specify a 32-bit or 64-bit real memory operand, it is converted to temp real 
format before it is subtracted from ST. 


If any real value is subtracted from infinity or infinity is subtracted from any real 
value, the result is infinity. Subtracting two infinities of the same sign is an invalid 
operation. 


Exceptions 
SF PE UE OE ZE DE IE 


Before 


FSUBR 
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Before 


FSUBR DWORD PTR [ESI+4] 


Memory pointer is short real 2.2. 
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FTST 8087/80287/80387 


Test for Zero 


Legal Form 
FTST ; compare (ST, 0.0) 


Description | ; 
This instruction compares the top of stack with 0.0 and sets the floating-point con- 
dition codes according to the results of the comparison. 


The following table shows the condition code settings that result from the com- 
parison function. FTST considers +0.0 and —0.0 to be equal. 


Condition C3 C2 cl co 
ST > 0.0 0) 0) _ 0 
ST < 0.0 0 0 — 1 
ST = 0.0 1 6) _ 0) 
STisaNaN 1 1 _ 1 


The condition codes are arranged in the status word so that C3, C2, and CO map into 
the same bit positions as the ZF, PF, and CF bits of the EFLAGS register. Thus, issu- 
ing the following instructions sets the EFLAGS register as if the comparison had 
been performed on integer values: 


FIST ; Floating-point compare 
FSTSW AX ; Store status word to AX 
SAHF ; Store AH into flags 


You can then use any conditional jump instruction (JE, JNE, JA, JAE, JB, or JBE) to 
branch on the result of the comparison. Use JP to test whether ST is a NaN. 


Unlike most arithmetic operations, FTST will signal the Invalid (IE) exception if ST 
is a quiet NaN. 


Exceptions 
SF PE UE OE ZE DE IE 


Gea es a 
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Before 


FTST 
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FUCOM 80387 


Unordered Compare 


Legal Forms 


FUCOM compare (ST, ST(1)) 
FUCOM mem32 ; compare (ST, mem32) 
FUCOM mem64 compare (ST, mem64) 


FUCOM ST(n) ; compare (ST, ST(n)) 
FUCOMP ; compare (ST, ST(1)); pop() 
FUCOMP mem32 ; compare (ST, mem32); pop(); 


FUCOMP mem64 compare (ST, mem64); pop(); 
FUCOMP ST(n) compare (ST, ST(n)); pop(); 
FUCOMPP ; compare (ST, ST(1)); pop(); pop(); 


Description 


This instruction is identical to FCOM except that no exceptions are signaled if either 
operand in the compare function is a quiet NaN, (the comparison is unordered). 
FUCOM executes the function compare (op1, op2) and sets the floating-point con- 
dition code according to the results of the comparison. The stack is optionally 
popped once or twice. 


The following table shows the condition code settings that result from the compare 
function. FUCOM considers +0.0 and —0.0 to be equal. 


Condition C3 C2 C1 CO 

op1 > op2 0 0 — 0 

op1 < op2 0 0 - 1 

op1 = op2 1 0 - 0 

unordered 1 1 — 1 
(NaN compared) 


The condition codes are arranged in the status word so that C3, C2, and CO map into 
the same bit positions as the ZF, PF, and CF bits of the EFLAGS register. Thus, the 
following instructions set the EFLAGS register flags as if the comparison had been 
performed on integer values: 


FUCOM op ; Floating-point compare 
FSTSW AX ; Store status word to AX 
SAHF ; Store AH into flags 


You can then use any conditional jump instruction (JE, JNE, JA, JAE, JB, or JBE) to 
branch on the result of the comparison. Use JP to test for unordered comparison. 


390 


Exceptions 
SF PE 


UE OE ZE DE IE 


8: Reference Section 


After 


ST 
ST (1) 
FUCOMP ST(2) 
C, C, C, C, 
CORE ES 
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FWAIT 8087/80287/80387 
Wait Until Not Busy 


Legal Form 
FWAIT 


Description 


This is an alternative mnemonic for the WAIT instruction, but many assemblers 
allow you to encode it as FWAIT because it relates to the NDP. (See “WAIT” earlier 
in this chapter.) 


Exceptions 
SF PE UE OE ZE DE IE 


392 


8: Reference Section 


FXAM 8087/80287/80387 
Examine Top of Stack 


Legal Form 
FXAM ; CC < examine (ST) 


Description 


This instruction sets the condition code bits in the floating-point status word (SW) 
according to the value of the top of stack. The following table indicates the settings 
that can arise based on different values of ST. 


ST C3 C2 Cl CO 
Unsupported* 0 0 S 0 
NaN 0 0 Ss 1 
Valid Gnormal) 0 1 S 0 
Infinity 0 1 S 1 
Zero 1 °0O S 0) 
Unused (TW = empty) 1 0 S 1 
Denormal 1 1 S 0 

1 1 Ss 1 


Unused (TW = empty) 


*Unsupported values are special bit patterns that were valid 
for the 8087 or 80287 but are no longer supported. These 
include pseudo-NaN, pseudo-zero, pseudo-infinity, 

and unnormals. 


The s bit in C1 is set to the sign of the value of ST, with 0 indicating a positive value 
and 1 indicating a negative. 


Exceptions 
SF PE UE OE ZE DE IE 


393 


MICROSOFT’S 80386/80486 PROGRAMMING GUIDE 


Example 
Before | After 
sof s [= 
ST (1) 46.0 ST (1) 46.0 
FXAM 
CC 0.1e. 
PO] Of] 1. 
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FXCH 8087/80287/80387 
Exchange Stack Elements 


Legal Forms 


FXCH ; temp < ST; ST < ST(1); ST(1) <— temp 
FXCH ST(n) ; temp < ST; ST ¢ ST(n); ST(n) © temp 
Description 


This instruction swaps the contents of the specified stack registers. This allows 
values to move to the top of stack, which is the standard operand location for many 
floating-point instructions. 


Exceptions 
SF PE UE.OE ZE DE IE 


Je RM ere 


Example 


Before 


FXCH 
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FXTRACT 8087/80287/80387 


Extract Floating-Point Components 


Legal Form 

FXTRACT ; temp <— ST; ST — exponent(temp) 
; push(fraction(temp) ) 

Description 


This instruction breaks the top of stack into its constituent parts, the significand and 
the exponent. The exponent is stored as a true, unbiased value, not as just the bit 

_ pattern in the exponent field of the floating-point representation. This operation 
leaves the fraction or significand on the top of stack and the exponent at ST(1). The 
original value is destroyed. 


If the original top of stack is 0, the exponent portion is set to negative infinity. 


Exceptions 
SF PE UE OE ZE DE IE 


as et ES a CE 


FXTRACT 
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FYL2X 8087/80287/80387 
Compute Y x log, X 


Legal Form 
FYL2X ; temp <— log,(ST); pop(); ST <— ST * temp 


Description 


This instruction pops the top of stack, takes the base 2 logarithm, and multiplies the 
result by the new top of stack. Another way of expressing the function is: 


ST(1) x log, ST 


The initial top of stack must be a positive value, 0 through infinity. If it is not, the 
results of the operation are undefined. 


You can also use this instruction to compute logarithms with a base other than 2, 
relying on the identity: 


log,, x = Jog, x) / dog, 1) 


The following code fragment illustrates this computation. 


FLD1 20 

FLD n ; n, 1.0 

FYL2X ; 10g, n 

FLD1 ; 1.0, log, n. 
FDIVP SCL) 5. ST ; l/log, n 

FLD X ; xX, l/log, n 
FYL2X ; log, x * 1/log, n 
Exceptions 


SF PE UE OE ZE DE IE 


FYL2X 
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FYL2XP1 8087/80287/80387 


Compute Y x log, (X + 1) 


Legal Form 
FYL2XP1 ; temp <— log,(ST+1.0); pop(); ST — ST * temp 


Description 


This instruction pops the top of stack, adds 1.0 to the value, takes the base 2 
logarithm, and multiplies the result by the new top of stack. Another way of ex- 
pressing the instruction is: 


ST(1) x log, (ST + 1.0) 


The initial top of stack must be within the range —1 + V2 /2<X<1-V2 /2, or the 
result of the instruction is undefined. 


This instruction is provided so that adding 1.0 to the top of stack and executing 
FYL2X does not result in a precision loss. Because the FYL2XP1 function is com- 
puted differently from the FYL2X instruction, a special range restriction exists. 
FYL2XP1 is also useful in computing the arcsinh, arccosh, and arctanh inverse 
hyperbolic trigonometric functions. 


Exceptions 
SF PE UE OE ZE DE IE 


Fo tta eae 


Example 


Before 


FYL2XP1 
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F2XM1 8087/80287/80387 
Compute 2* -1 


Legal Form 
F2XM1 ; ST & 2st - 1 


Description 


This instruction replaces the current top of stack (ST) with the value of the function 
2ST — 1. However, the initial operand value must be within the range -0.5 < x $ +0.5 
or the result of the operation is undefined. 


The function 2* — 1, rather than the simpler 2*, is provided to ensure precision when 
x is near 0 (for example, when computing hyperbolic trigonometric functions). 


Because the range of the F2XM1 instruction is narrow, subroutines to compute 2 
must use FRNDINT and FSCALE to bring the instruction into a legal range and scale 
the result to a proper value. 


You can compute the general function xy by using the identity: 
xv = 2x log, x 


and using the FYL2X and F2XM1 instructions. 


Exceptions 
SF PE UE OE ZE DE IE 


F2XM1 
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Appendix A 


POWERS 
OF TWO 


Exponent Decimal Value Hex Value 
0 1 1 
1 2 Z 
2 4 4 
3 8 8 
4 16 10 
5 32 20 
6 64 40 
io. 128 80 
8 256 100 
9 512 200 

10 1024 400 

11 2048 800 

12 4096 1000 

13 8192 2000 

14 16384 4000 

15 32768 8000 

16 65536 10000 

20 1048576 100000 

31 2147483648 80000000 


32 4294967296 100000000 
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Low-Order 
Bits 


0000 
0001 
0010 
0011 
0100 
0101 
0110 
0111 
1000 
1001 
1010 
1011 
1100 
1101 
1110 
1111 


Appendix B 


ASCII 


CHARACTER 


0001 


DLE 
DC1 
DC2 
DC3 
DC4 
NAK 
SYN 
ETB 
CAN 
EM 
SUB 
ESC 
FS 
GS 
RS 
US 


0010 


High-Order Bits 


0011 


- OO ON NAW BROWN HK © 


~~ V i A»: 


0100 


OAZrA4SHH TOMO? Oe 


SET 


0101 


SHR ONK KEK CHYNRON 


0110 


/ 


Ons -Wo Ke DO RAAn TD 


O111 


lm T_NS HM Ad Ce HH OD 


DEL 
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OPCODE 
TABLES 


The following opcode tables aid in interpreting 80386/80486 object code. Use the 
high-order 4 bits of the opcode as an index to a row of the opcode table; use the 
low-order 4 bits as an index to a column of the table. If the opcode is OFH, refer to 
the 2-byte opcode table, and use the second byte of the opcode to index the rows 
and columns of that table. 


Key to Abbreviations 


Operands are identified by a two-character code of the form Zz. The first character, 
an uppercase letter, specifies the addressing method; the second character, a lower- 
case letter, specifies the type of operand. 


Codes for Addressing Method 


A Direct address. The instruction has no mod r/m byte; the address of the operand 
is encoded in the instruction; no base register, index register, or scaling factor can 
be applied—for example, far JMP (EA). 


C The reg field of the mod r/m byte selects a control register—for example, MOV 
(OFH 20H, 0FH 22H). 


D The reg field of the mod r/m byte selects a debug register—for example, MOV 
(OFH 21H, OFH 23H). 


E Amodt/m byte follows the opcode and specifies the operand. The operand is 
either a general register or a memory address. If it is a memory address, the address 
is computed from a segment register and any of the following values: a base register, 
an index register, a scaling factor, or a displacement. 


* Adapted and reprinted by permission of Intel Corporation, copyright © 1986. 
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F Flags register. 


G The reg field of the mod r/m byte selects a general register—for example, 
~ ADD (00H). 


I Immediate data. The value of the operand is encoded in subsequent bytes of the 
instruction. 


J The instruction contains a relative offset to be added to the instruction pointer 
register—for example, JMP short, LOOP. 


M The mode r/m byte may refer only to memory—for example, BOUND, LES, 
LDS, LSS, LFS, LGS. 


O The instruction has no mod r/m byte; the offset of the operand is coded as a 
word or doubleword (depending on address size attribute) in the instruction. No 
base register, index register, or scaling factor can be applied —for example, MOV 
(AOH-A3H). 


R The mod field of the mod r/m byte may refer only to a general register—for 
example, MOV (OFH 20H, OFH 26H). 


S The reg field of the mod r/m byte selects a segment register—for example, 
MOV (8CH, 8EH). 


T The reg field of the mod r/m byte selects a test register—for example, 
MOV (OFH 24H). 


X Memory addressed by DS:SI—for example, MOVS, COMPS, OUTS, LODS, SCAS. 
Y Memory addressed by ES:DJ—for example, MOVS, CMPS, INS, STOS. 


Codes for Operand Type 


a Two single-word operands in memory or two double-word operands in memory, 
depending on operand size attribute (used only by BOUND). 


b Byte (regardless of operand size attribute). 

c Byte or word, depending on operand size attribute. 

da Doubleword (regardless of operand size attribute). 

Dp 32-bit or 48-bit pointer, depending on operand size attribute. 
s 6-byte pseudodescriptor. 

v Word or doubleword, depending on epemnd size attribute. 


w Word (regardless of operand size attribute). 
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Register Codes 


When an operand is a register encoded in the opcode, the register is identified by 
its name, for example, AX, CL, or ESI. The name of the register indicates whether 
the register is 32 bits, 16 bits, or 8 bits. A register identifier of the form eXX is used 
when the width of the register depends on the operand size attribute. For example, 
eAX indicates that the AX register is used when the operand size attribute is 16 and 
that the EAX register is used when the operand size attribute is 32. 
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One-Byte Opcode Table 


— 
oy a = a oe a = ae = ee Iv 
pea 
os a s 2 a = ee - Ae a es - 
XOR XOR XOR XOR XOR XOR Ss. 
Eb,Gb Ev,Gv Gb,Eb Gv,Ev AL,Ib eAX, Iv 
eek nee en a pee a ed 
BOUND 
PUSHAD | POPAD OPSIZE: | ADRSIZE: 
Gv,Ma ae - 
JB pe JZ pe 
Jb Jb 
Group 1 | Group 1 Group 1 TEST TEST XCHG XCHG 
Eb, Ib Ev,Iv Ev,Ib Eb,Gb Ev,Gv Eb,Gb Ev,Gv 
NOP XCHG XCHG XCHG XCHG XCHG XCHG XCHG 
eCxX,eAX | eDX,eAX | eBX,eAX | eSPeAX | eBPeAX | eSLeAX | eDILeAX 
MOV MOV MOV MOV 
ALOb Ob, AL MOVSB |MOVSW/D| CMPSB |CMPSW/D 
MOV MOV MOV MOV MOV MOV MOV 
AL, Ib CL,Ib DL, Ib BL, Ib AH, Ib CH, Ib DH, Ib BH, Ib 
Group 2 | Group 2 |RET(near) RET(near) LES LDS MOV MOV 
Eb, Ib Ev,Ib_ Iw neat! Gv,Mp | GvMp | Eb Jb Ev,Iv 
Group 2 | Group 2 | Group 2 | Group 2 | 
LOOPNE | LOOPE LOOP JCXZ IN IN OUT OUT 
Jb Jb Jb Jb AL, Ib eAX,Ib Ib, AL Ib,eAX 
REP Group 3 | Group 3 


NOTE: All numbers are in hex. (continued) 


5 
<— 


By 
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One-Byte Opcode Table. continued 


8 9 A B C D E F 
OR OR OR OR OR OR a 2-byte 
Eb,Gb Ev,Gv Gb,Eb Gv,Ev AL,Ib eAX,Iv escape 
ae 
es = es ae _ re 4 nee Iv 
SUB 
CMP CMP CMP 
a aa os a ane a Gv,Ev AL, Ib eAX, Iv 
sta IMUL sie IMUL INSB INSW/D | OUTSB |JOUTSW/D 
Gv,Ev, Iv Gv,Ev,Ib | Yb,DX Yv,DX DX,Xb DxX,Xv 
yp I JLE 
J tb Jb Jb Jb 
MOV MOV 
Eb,Gb Ev,Gv ao nae ae ey oF . 


Ap Fv Fv 
TEST TEST 
STOSB |STOSW/D| LODSB |LODSW/D} SCASB |SCASW/D 

AL, Ib eAX,Iv 

MOV MOV | MOV MOV MOV MOV MOV MOV 
eAX,Iv eCxX,Iv eDX,Iv eBX,Iv eSP, Iv eBP Iv eSLIv eDI Iv 
ENTER RET far INT INT | 

ESC ESC ESC ESC ESC ESC ESC ESC 

0 1 Z 3 4 5 6 7 
CALL JMP JMP JMP IN IN OUT OUT 

AV Jv Ap | Jb AL,DX | eAX,DX | DX,AL |} DX,eAX 


409 


MICROSOFT’S 80386/80486 PROGRAMMING GUIDE 


Two-Byte Opcode Table (first byte is OFH) 


OV OV OV OV 
mak ane eee eee Hee Ban 


om “a _ 
ee se a — a res aes 
si ge ‘ SHLD SHLD MPXCHGICMPXCHG 
EV, oe Ev,Gv,Ib | Ev,Gv,CL | Eb,Rb Ev,Rv 
MOVZX | MOVZX 
oe = Gy, Eb Gv;Ew 
XADD 
ny Ev,Rv 


: <4 = 


a 
af 
Sd 
Be 
Cama 
ne 


(continued) 
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Two-Byte Opcode Table. continued 


8 9 A B C D E 


js JNS yp JNP JL JNL JLE JNLE 
Jv Jv JV Jv Jv Jv Jv Jv 
SETS SETNS SETP SETNP SETL SETNL SETLE | SETNLE 
Eb Eb Eb Eb Eb Eb Eb Eb 
PUSH POP BTS SHRD SHRD IMUL 
GS GS Ev,Gv_ | Ev,Gv,Ib | Ev,Gv,CL Gv,Ev 
Group 8 BIC BSF BSR MOVSX | MOVSX 
Ev,Ib Ev,Gv Gv,Ev Gv,Ev Gv,Eb Gv,Ew 
BSWAP | BSWAP | BSWAP | BSWAP | BSWAP | BSWAP | BSWAP | BSWAP 
EAX ECX EDX EBX ESP EBP ESI EDI 


4i1 
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Opcodes Determined by Bits 5, 4, 
and 3 of mod r/m Byte: mod nnn r/m 


000 001 010 #11 100 

1 

2 

3 

4 

5 

6 

: 

tT LT Lt [# Te 


Numeric Data Processor Extensions 


The following tables show the opcode map to the 80386/80486 instruction set for 
the numeric data processor (NDP) extensions. The operand abbreviations for these 
tables are: 


Es Effective address, short real (32-bit) 
El Effective address, long real (64-bit) 

Et Effective address, temp real (80-bit) 
Ew Effective address, word (16-bit) 

Ed Effective address, doubleword (32-bit) 
Eq Effective address, quadword (64-bit) 
Eb Effective address, BCD (80-bit) 

Ea Effective address (no operand size) 
ST(i) Stack element i 

ST Stack top 
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Format: mod nnn r/m 
ESC 0 
nnn 
000 001 010 011 100 101 110 111 
FADD FMUL FCOM | FCOMP FSUB FSUBR FDIV FDIVR 
Es Es Es Es Es Es Es Es 
FADD FMUL FCOM | FCOMP FSUB FSUBR FDIV FDIVR 
ST,STG) | ST,STG | ST|ST@ | ST,ST@ | ST,STG | ST,ST@ | ST,STG | ST,ST@ 
i=r/m 
ESC 1 
nnn 
000 001 010 011 100 101 110 111 
FLD P |FLDENV]} FLDCW | FSTENV | FSTCW 
Es Ea Ew Ea Ew 
r/m FLD 
000 | STC) ST(O) —_ FCHS FLD1 F2XM1 | FPREM 
FLD FXCH 
001 
ST(1) ST(1) DP FABS FLDL2T | FYL2X |FYL2XP1 
FLD FXCH 
010 
sT@)_|_ST@)  catbmanh ane! ee 
FLD FXCH | 
011 TAN | FSIN 
sT3) | @ | gucaedid (ead 
LD FXCH 
100 FIST | FLDLG2 |FXTRACT | FRNDINT 
ST(4) 
FLD FXCH 
110 FLDZ |FDECSTP | FSIN 
ST) _| sT@ | ft | sn 
FLD FXCH 
111 FINCSTP | FCOS 
STM) ST) Pf Lf fancste 
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00 
mod=01 
10 


mod=11 


00 
mod=01 
10 


mod=11 


r/m 


00 


mod=01 
10 


mod=11 
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ESC 2 


nnn 


000 001 010 011 100 101 110 111 
FIADD | FIMUL FICOM | FICOMP | FISUB FISUBR | FIDIV FIDIVR 
Ew Ew Ew Ew Ew Ew Ew Ew 


*r/m=5 


ESC 3 


nnn 


000 001 010 011 100 101 110 111 
FILD FIST FISTP FLD FSTP 
Ew — Ew Ew Et Et 


Group 3a: mod=11, nnn=100 


000 001 010 O11 100 101 110 111 


ESC 4 


nnn 
110 111 


ADD me on cow te oe Dy en 
FMUL | FCOM | FCOMP | FSUB | FSUBR | FDIV | FDIVR 
STG),ST | ST@,ST | ST@,ST | STG@,ST | ST@,ST | ST@,ST | STG,ST | ST@,ST 


i=r/m 


Appendix C: Opcode Tables 


ESC 5 
nnn 
000 001 010 011 100 101 110 111 
00 
Seas FLD FST FSTP | FRSTOR FSAVE | FSTSW 
10 El El El Ea Ea Ew 
ie FFREE FST FSTP | FUCOM |FUCOMP 
mod=11 STC) st@ | sT@ | st@ | st@ 


i=r/m 


ESC 6 


nnn 


O11 1 110 111 
FIADD | FIMUL | FICOM | FICOMP| FISUB | FISUBR | FIDIV | FIDIVR 
Ed Ed Ed E Ed Ed 
FADDP | FMULP FCOMPP*| FSUBP | FSUBRP | FDIVP | FDIVRP 
ST@,ST | STG,ST STG),ST | STG,ST | ST@,ST | ST@,ST 


*r/m=001 


000 001 010 100 01 
Ed Ed d 


ESC 7 
nnn 


000 001 010 011 100 101 110 111 
FILD FIST FISTP FBLD FILD FBSTP | FISTP 
Ed Ed Ed Eb Eq Eb Eq 
FSTSW* 
AX 


*r/m=000 
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INSTRUCTION 
FORMAT AND 
TIMING 


This appendix describes the 80386-family instruction set. A table lists all instruc- 
tions with instruction encoding diagrams and clock counts. Details of the instruc- 
tion encoding are provided in the following sections, which describe the encoding 
structure and the definition of fields occurring within the instructions. 


80386/80486 Instruction Encoding 
and Clock Count Summary 


To calculate elapsed time for an instruction, multiply the instruction clock count by 
the processor clock period (for example, 40 ns for a processor operating at 25 MHz). 


For more information on the encodings of instructions, refer to “Instruction Encod- 
ing” Clater in this appendix), which explains the structure of instruction encodings 
and defines the encodings of instruction fields. 


Instruction clock count assumptions | 
1. The instruction has been prefetched and decoded and is ready for execution. 


2. Bus cycles do not require wait states. 

3. There are no local bus HOLD requests delaying processor access to the bus. 
4. No exceptions are detected during instruction execution. 
5 


. If an effective address is calculated, it does not use two general-register com- 
ponents. One register scaling and displacement can be used within the clock 


* Adapted and reprinted by permission of Intel Corporation, 1986. 
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counts shown. However, if the effective address calculation uses two general- 
register components, add one clock to the clock count shown on the 80386; one 
clock may be added on the 80486. 


6. Accesses are aligned. Misaligned accesses require another memory read cycle. 


7. On the 80486, one additional clock may be added under the following 
conditions: 


m The base register used as an effective address in one instruction is the desti- 
nation register of the immediately preceding instruction. 


@ Displacement mode addressing and immediate addressing are used in the 
same instruction. 


8. A page translation hits the TLB. 

9. The cache on the 80486 is enabled and the following conditions are true: 
m@ Cache fills complete before the next access to the same cache line. 
m@ JMP targets hit the cache. 
m@ No invalidate cycles Seeur 


m@ Instructions that read consecutive memory words start on a 16-byte 
boundary. 


10. In the 80386SX, add one read cycle for every 16 bits over the initial 16 bits 
accessed by the instruction. 


Instruction clock count notation 
1. If two clock counts are given, the smaller one refers to a register operand, and 
the larger one refers to a memory operand. 


2. n= number of times repeated. 


3. m= number of components in the next instruction executed, where any 
displacement counts as one component, any immediate data counts as one 
component, and each of the other bytes of the instruction and prefix(es) counts 
as One component. 


Instruction notes for table 


The following are instruction notes for the “General Notes” column of the table 
titled “80386/80486 Instruction Set Clock Summary,” which begins on page 421. 
The instruction notes for the “Cache Notes” column are found on page 438 as table 
footnotes. 


Notes a through c apply to real address mode only: 


a. This is a protected-mode instruction. Trying to execute in real mode results in 
exception 6 (invalid opcode). | 
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Exception 13 fault (general protection) occurs in real mode if an operand refer- 
ence is made that partially or fully extends beyond the maximum CS, DS, ES, 
FS, or GS limit, FFFFH. Exception 12 fault (stack segment limit violation or not 
present) occurs in real mode if an operand reference is made that partially or 
fully extends beyond the maximum SS limit. 


. This instruction may be executed in real mode where it initializes the CPU for 


protected mode. 


Notes d through g apply to real address mode and protected virtual address mode: 


d. 


e. 


f. 
g. 


The 80386 and 80486 use an early-out multiply algorithm. The number of 
clocks depends on the position of the most significant bit in the operand 
(multiplier). 


Clock counts are minimum to maximum. To calculate actual clocks, use the 
following formula: 


Actual Clock = if m < > 0 then max ([log, |_m| ], 3) + 6 clocks 
if m = 0 then 9 clocks (where m is the multiplier) 
An exception might occur, depending on the value of the operand. 
LOCK is asserted, regardless of the presence or absence of the LOCK prefix. 


LOCK is asserted during descriptor table accesses. 


Notes h through r apply to protected virtual address mode only: 


h. 


— 


Exception 13 fault (general protection violation) occurs if the memory operand 
in CS, DS, ES, FS, or GS cannot be used because of a segment limit violation or 
because of an access rights violation. If a stack limit is violated, an exception 12 
(stack segment limit violation or not present) occurs. 


. For segment load operations, the CPL, RPL, and DPL must agree with the privi- 


lege rules to avoid an exception 13 fault (general protection violation). The seg- 
ment’s descriptor must indicate “present” or exception 11 (CS, DS, ES, FS, or GS 
not present). If the SS register is loaded and a stack segment not present is 

detected, an exception 12 (stack segment limit violation or not present) occurs. 


. All segment descriptor accesses in the GDT or LDT made by this instruction 


assert LOCK to maintain descriptor integrity in multiprocessor environments. 


. JMP, CALL, INT, RET, and IRET instructions referring to another code segment 


cause an exception 13 (general protection violation) if an applicable privilege 
rule is violated. 


. An exception 13 fault occurs if CPL is greater than 0. (0 is the most privileged - 


level.) 


. An exception 13 fault occurs if CPL is greater than IOPL. 
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. The IF bit of the flag register is not updated if CPL is greater than IOPL. The 


IOPL and VM fields of the flag register are updated only if CPL is equal to 0. 


. The PE bit of the MSW (CRO) cannot be reset by this instruction. Use MOV into 


CRO to reset the PE bit. 


. Any violation of privilege rules as applied to the selector operand does not 


cause a protection exception; rather, the zero flag is cleared. 


. If the coprocessor’s memory operand violates a segment limit or segment access 


rights, an exception 13 fault (general protection exception) occurs before the 
ESC instruction executes. An exception 12 fault (stack segment limit violation 
or not present) occurs if the stack limit is violated by the operand’s starting 
address. 


. The destination of a JMP, CALL, INT, RET, or IRET must be in the defined limit 


of a code segment, or an exception 13 fault (general protection violation) 
occurs. 


LT 4 4 


80386/80486 Instruction Set Clock Summary 
80486 | Miss | Cache | 80386 | General 
Instruction Notes | Clocks | Notes 


General Data Transfer 
MOV = Move 


Register to register 1000100w mod reg r/m 


Register to memory 1000100w mod reg r/m 


Memory to register 1000101w mod reg r/m 


_. 
3 
3 
mo 
[ay 
mr) 
gm 
ct 
(v) 
Q 
[°)) 
ct 
[2% 


Immediate to register / short form 1011w_ reg 


/ long form 1100011w mod 000 r/m | immediate data 


mod 000 r/m immediate data 


Immediate to memory 1100011w 


Memory to accumulator 1010000w full displacement 


Accumulator to memory 1010001w full displacement 

Register to segment register (RM) 10001110 mod sreg 3r/m 
(protected mode) | 

Memory to segment register (RM) 10001110 mod sreg 3r/m 


(protected mode) 


| 10001100 | mod sreg 3r/m 


Segment register to register/memory 10001100 
MOVZX/MOVSX = Move zero/sign extension 


(z= 0 MOVZX/z = 1 MOVSX) 
1011z11Ww 
1011z11w 


Register to register 00001111 


Memory to register 00001111 
PUSH = Push 


Register / short form 01010 reg 


/ long form 11111111 mod 110 r/m 


mod 110 r/m 


> 


Memory 11111111 


Segment register / short form 000 sreg 2110 


/ long form 00001111 10 sreg 3000 


= 
3 
3 
© 
oO. 
—. 
m@ 
ct 
a) 
Qa 
fo¥) 
et 
a 


Immediate 011010s0 


PUSHA = Push all 01100000 


(continued) 
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80486 | Miss | Cache | 80386 | General 
Instruction cks \Penalty| Notes |Clocks | Notes 


POP = Pop 
Register / short form 01011 reg 


/ long form 10001111 mod 000 r/m 

Memory 10001111 mod 000 r/m 
Segment register / short (RM) 000sreg2111 

/ long (RM) 00001111 10 sreg 3001 

Protected mode / short 

/ long 

POPA = Popall 01100001 

XCHG = Exchange 

Register with register 1000011w mod reg r/m 

Memory with register 1000011w mod reg r/m 


Register with accumulator 10010 reg 


LEA = Load EA to register 10001101 


Segment Control 


LDS = Load pointer to DS 
(protected mode) | 


LES = Load pointer to ES 11000100 
(protected mode) 


LFS = Load pointer to FS 00001111 10110100 


(protected mode) 


LGS = Load pointer to GS 00001111 10110101 
(protected mode) 


LSS = Load pointer to SS Qo001111 10110010 


(protected mode) 


Flag control 
CLC = Clear carry flag 
CLD = Clear direction flag 


ecov 


Instruction 


CLI = Clear interrupt enable flag 
CMC = Complement carry flag 
LAHF = Load AH with flags 
POPF = Pop flags — 

(protected mode) 
PUSHF = Push flags 

(protected mode) 
SAHF = Store AH from flags 
STC = Set carry flag 
STD = Set direction flag 


STI = Set interrupt enable flag 


Arithmetic 
TTT=0/ ADD =Add 
TTT =1/OR= Logical OR 
TTT =2/ ADC = Add with carry 
TTT =3/SBB = Subtract with borrow 
TTT =4/ AND =Logical AND 
TTT =5/SUB = Subtract 
TTT =6/XOR = Logical exclusive OR 
Register to register 
Memory to register 
Register to memory 
Immediate to register 
Immediate to accumulator 
Immediate to memory 
INC = Increment 
Register / short form 


/ long form 


11111010 


11110101 


10011111 


10011101 


10011100 


10011110 


11111001 


11111001 


11111011 


OOTTTOdw 


OOTTTO1w 


OOTTTOOW 


10000sw 


OOTTT10w 


10000sw 


01000 reg 


1lll1llilw 


mod reg r/m 


mod reg r/m 


mod TIT r/m immediate data 


immediate data 


mod TTT r/m immediate data 


mod 000 r/m 


NH BO WR WB Oo Ww NY WN 


nN 


Cache 
80486 | Miss | Cache | 80386 
Clocks |Penalty| Notes | Clocks 
5) 


N NH NO MN DW WNW 


N 


N 


General 
Notes 


(continued) 
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80486 | Miss 
Instruction Clocks |\Penalty 


mod 000 r/m 


Cache | 80386 | General 
Notes | Clocks Notes 
6 b,h 


’ 


Memory 11111lliw 


DEC = Decrement 


Register / short form 01001 reg 
/ long form 1111l1lliw reg 001 r/m immediate data 


111111llw 


Memory reg 001 r/m immediate data 


NOT = Logical NOT 1111011w mod 010 r/m 
Register 
Memory 

NEG = Negate 1111011w mod 011 r/m 
Register 
Memory 


CMP = Compare 


001110dw mod reg r/m 


Register with register 


Memory with register 0011100w mod reg r/m 


Register with memory 0011101w mod reg r/m 


Immediate with register 100000sw mod 111 r/m immediate data 


mod 111 r/m immediate data 


Immediate with memory 100000sw 


ale 
3 
3 
i) 
Qa 
—_. 
fet] 
ct 
6) 
Qa 
fos) 
ct 
mQ 


Immediate with accumulator 0011110w 


TEST = Logical AND with no result but flags 


mod reg r/m 


Register with register 1000010w 


Memory with register 1000010w mod reg r/m 


Immediate with register 1111011w mod 000 r/m immediate data 


mod 000 r/m immediate data 


Immediate with memory 1111011w 


aed 
3 
3 
@ 
Qa. 
wowed 
a 
ct 
© 
a. 
ry) 
cor 
rey 


Immediate with accumulator 1010100w 


AAA = ASCII adjust for add 00110111 


AAS = ASCII adjust for subtract 00111111 


Sov 


80486 | Miss | Cache | 80386 | General 
Instruction Clocks |Penalty| Notes | Clocks | Notes 
4 


DAA = Decimal adjust for add 00100111 2 


DAS = Decimal adjust for subtract _ {00101111 


MUL= Unsigned multiply 

Accumulator with register 1111011w mod 100 r/m 
—byte 

—word 


—dword 


Accumulator with memory 1111011w mod 100 r/m 


—byte 
—word 
—dword 
IMUL = Integer multiply (signed) 
Accumulator with register 1111011w 


—byte 
—word 
—dword 


Accumulator with memory 1111011w mod 100 r/m 


—byte 
—word 
—dword 


Register with register 00001111 
—byte 


10101111 


—word 
—dword 


Register with memory 00001111 10101111 


—byte 


—word 


—dword 


(continued) 
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Instruction 


Register with immediate to register 
—byte 
—word 
—dword 

Memory with immediate to register 
—byte 
—word — 
—dword 

DIV = Divide (unsigned) 

Accumulator by register 
—byte 
—word 
—dword 

Accumulator by memory 
—byte 
—word 
—dword 

IDIV = Integer divide (signed) 
Accumulator by register 
'—byte 

—word 
—dword 

Accumulator by memory 
—byte 
—word 


—dword 


011010s1 immediate data 


011010s1 immediate data 


1111011w mod 110 r/m 


1111011w mod 110 r/m 


1111011w mod 111 r/m 


1111011w mod 111 r/m 


80486 | Miss | Cache | 80386 
Clocks \Penalty| Notes | Clocks 


General 
Notes 


Lev 


Instruction 


AAD = ASCII adjust for divide 


AAM = ASCII adjust for multiply 
CBW = Convert byte to word 


CWD = Convert word to dword 


Logic 
Shift/Rotate 
TTT=0/ROL= Rotate left 
TTT =1/ROR= Rotate right 
TTT = 2/RCL= Rotate through carry left 
TTT =3/RCR= Rotate through carry right 
TTT =4/SHL/SAL = Shift left 
TTT =5/SHR= Shift right 
TTT =7/SAR= Shift arithmetic right 
Rotate through carry (RCL/RCR) 
Register by 1 
Memory by 1 
Register by CL 
Memory by CL 
Register immediate 


Memory immediate 


All others CROL/ROR/SHL/SHR/SAL/SAR 


Register by 1 
Memory by 1 
Register by CL 
Memory by CL 
Register immediate 


Memory immediate 


fe oe 


Clocks |Penalty| Notes | Clocks | Notes 
4 19 
15 17 
3 3 
3 Z 


10011001 


1101000w mod TTT r/m 


1101000w mod TIT r/m 


1101001w mod TIT r/m 


1101001w mod TIT r/m 


1100000w mod TIT r/m immed 8-bit data 


1100000w mod TTT r/m immed 8-bit data 


1101000w mod TTT r/m 


1101000w mod TTT r/m 


1101001w mod TTT r/m 


1101001w mod TTT r/m 


1100000w mod TTT r/m immed 8-bit data 


1100000w mod TIT r/m immed 8-bit data 


(continued) 
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. 80486 | Miss | Cache | 80386 | General 
Instruction Clocks |Penalty| Notes | Clocks | Notes 


SHRD/SHLD = Shift right/left double 

r=0/SHLD — r=1/SHRD 
Register by immediate 00001111 immed 8-bit data 
Memory by immediate 00001111 immed 8-bit data 
Register by CL 00001111 mod reg r/m 
Memory by CL 00001111 mod reg r/m 

BSWAP = Byte swap 00001111 

XADD = Exchange and add | 

_ Register to register 00001111 mod reg r/m 
Memory to register 00001111 mod reg r/m 

CMPXCHG = Compare and Exchange 
Register to register 00001111 mod reg r/m 


Memory to register 00001111 1011000w mod reg r/m 


String Instructions 

CMPS = Compare byte/word/dword 1010011w 
LODS = Load byte/word/dword 1010110w 
MOVS = Move byte/word/dword 1010010w 


SCAS = Scan byte/word/dword 1010111w 


STOS = Store byte/word/dword 1010101w 


REPE/REPNE CMPS = Repeated compare 
ECX =0 
ECX >0 
REP LODS = Repeated load 
ECX=0 
ECX>0 
REP MOVS = Repeated move 


ECX=0 


6cP 


' ECX=1 


80486 | Miss | Cache | 80386 | General 
Instruction , Clocks |Penalty| Notes | Clocks | Notes 
13 P 11 


ECX>1 12+3c PS 7+4c 


REPE/REPNE SCAS = Repeated scan 
ECX=0 
ECX>0 
REP STOS = Repeated store 
ECX=0 
ECX>0 


XLAT = Translate byte 11010111 


Bit Instructions 
BSF = Bit scan forward 
Register register 00001111 10111100 -} mod reg r/m 


Memory, register 00001111 10111100 mod reg r/m 


BSR = Bit scan reverse 


Register, register 00001111 mod reg r/m 

Memory, register 00001111 mod reg r/m 
BT = Bit test 

Register, immediate 00001111 mod 100 r/m | immed 8-bit data 

Memory, immediate 00001111 mod 100 r/m | immed 8-bit data 

Register, register 00001111 mod reg r/m 

Memory, register 00001111 mod reg r/m 


Bit modify 


TTT =5/BTS = Bit test and set 
TTT =6/BTR=Bittest and reset 


TTT =7/BTC= Bit test and complement 


Register, immediate 00001111 10111010 mod TTT r/m | immed 8-bit data 
Memory, immediate 00001111 10111010 mod TTT r/m_ | immed 8-bit data 


(continued) 
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80486 | Miss | Cache | 80386 | General 
Instruction Format Clocks |Penalty| Notes | Clocks | Notes 


Register, register 6 6 
Memory, ris 3 13 b,h 
SETccc = Set byte on condition 

cccc = 00 / SETO = Set if overflow 

cece = 01 /SETNO = Set ifno overflow 

cccc = 02 / SETB/SETNAE = Set if below/Set if not above or equal 

cccc = 03 / SETNB/SETAE = Set if not below/Set if above or equal 

eccc = 04 / SETE/SETZ = Set if equal/Set if zero 

cccc = 05 / SETNE/SETNZ = Set if not equal/Setifnot zero — 

cccc = 06 / SETBE/SETNA = Set if below or equal/Set if not above 

cccc = 07 / SETNBE/SETA = Set if not below or equal/Set if above 

cccc = 08 / SETS = Set if signed 

cccc = 09 / SETNS = Set if not signed 

cccc = 10 /SETP/SETPE = Set if parity/Set if parity even 

cccc = 11 /SETNP/SETPO = Set if not parity/Set if parity odd 

cccc = 12 / SETL/SETNGE = Set if less/Set if not greater or equal : 

cccc = 13 / SETNL/SETGE = Set if not less/Set if greater or equal 

cccc = 14 / SETLE/SETNG = Set if less or equal/Set if not greater 

cece = 15 / SETNLE/ SETG = Set if not less or equal/Set if greater 
Register (condition true) 
Register (condition false) 
Memory (conditiontrue) = 
Memory (condition false) 


Se, tN 
Tr rT ST ST 


Conditional Branch 


Jcc = Jump byte on condition 


8-bit displacement Qlliccce 8-bit displacement 
Full displacement 00001111 1000cccc full displacement 


cccc = 00 /JO = Jump if overflow 


cccc = 01 / TNO = Tumpn if no overflow 


ber 


80486 | Miss | Cache | 80386 | General 
Instruction Clocks |Penalty| Notes | Clocks | Notes 


ccecc = 02 /JB/JNAE = Jump if below/Jump if not above or equal 
cccc = 03 / JNB/JAE = Jump if not below/Jump if above or equal 
cccc = 04 /JE/JZ = Jump if equal/Jump if zero 

cece = 05 /JNE/JNZ = Jump if not equal/Jump if not zero 

cccc = 06 / JBE/JNA = Jump if below or equal/Jump if not above 
ccecc = 07 /JNBE/A = Jump if not below or equal/Jump if above 
cccc = 08 / JS =Jump if signed 

cccc = 09 /JNS =Jump if not signed 

cccc = 10 /JP/JPE = Jump if parity/Jump if parity even 

cece = 11 /JNP/JPO = Jump if not parity/Jump if parity odd 
cccc = 12 /JL/JNGE = Jump ifless/Jump if not greater or equal 
ccecc = 13 /JNL/JGE = Jump if not less/Jump if greater or equal 


cccc = 14/JLE/JNG = Jump if less or equal/Jump if not greater 


cccc = 15 / JNLE/JG = Jump if not less or equal/Jump if greater 


Branch taken 
Branch not taken 

JCXZ/JECXZ = Jump if CX/ECX is zero 8-bit displacement 
Branch taken 
Branch not taken 

LOOP = Loop ECX times 8-bit displacement 
Branch taken 
Branch not taken 

LOOPE/LOOPZ = Loop if equal/Loop if zero 8-bit displacement 
Branch taken 
Branch not taken 

LOOPNE/LOOPNZ = Loop if not equal/Loop if not zero 8-bit displacement 
Branch taken 


Branch not taken 


(continued) 
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80386/80486 Instruction Set Clock Summary. continued 


80486 | Miss | Cache | 80386 
Instruction Clocks |Penalty| Notes | Clocks 


Control Transfer 

JMP = Unconditional branch 

Short 11101001 8-bit displacement 7+m 

Direct within segment 11101001 full displacement 7+m 

Register indirect within segment 11111111 7+m 

Memory indirect within segment 11111111 10+m 

Direct intersegment (real mode) 11101010 unsigned full offset, selector 12+m 

Direct intersegment (protected mode) , 27+m 
Via call gate, same privilege 45+m 


Via task gate 43+task 44+task 
switch switch 


Via TSS 42+task 44+task 
switch switch 


Indirect intersegment (real mode) 11111111 mod 101 r/m 13 17+m 
Indirect intersegment (protected mode) 18 31+m 
Via call gate, same privilege 31 49+m 


Via task gate . 41+task 49+task 
switch switch 


Via TSS 42+task 49+task 
switch switch 


CALL = Call 

Direct within segment 11101000 full displacement 7+m 
Register indirect within segment 11111111 mod 010 r/m 7+m 
Memory indirect within segment 11111111 mod 010 r/m 10+m 
Direct intersegment (real mode) 10011010 unsigned full offset, selector 17+m 
Direct intersegment (protected mode) 34+m 


Via call gate, same privilege 52+m 


Via call gate, different privilege, no parameters 86+m 


Via call gate, different privilege, x parameters 77+4x 94+4x+m 


Via task gate 38+task 45+task 
switch switch 


cev 


80386 | General 
Clocks Notes 


45+task 
switch 


80486 | Miss | Cache 
Instruction Clocks |Penalty| Notes 
IJ 


37+task 
switch 


Via TSS 


Indirect intersegment (real mode) 1iiiiii mod 011 r/m 


Indirect intersegment (protected mode) 


17 22+m 


20 38+m 


Via call gate, same privilege 35 
69 


77+4X 


52+m 
86+m 
94+4x+m 


Via call gate, different privilege, no parameters 


Via call gate, different privilege, x parameters 


Via task gate 38+task 


switch 


49+task 
switch 


Via TSS 37+task 


switch 


49+task 
switch 


RET = Return from call 


Within segment | 11000011 b,g,hr 


Within segment adjusting ESP 11000010 16-bit displacement b,g,h,r 


Intersegment (real mode) 11001011 


Intersegment adjusting ESP (real mode) 11001010 16-bit displacement 


Intersegment (protected mode) 
Intersegment adjusting ESP (protected mode) 
Intersegment to different privilege level 
Intersegment to different privilege level adjusting ESP 
ENTER = Enter procedure 11001000 16-bit displacement, 8-bit level 
Level =0 
Level =1 


Level(1)>1 


| 


LEAVE = Leave procedure 11001001 


Software Interrupt 


INT3 = Debug interrupt int U int 
INTO = Interrupt on overflow 
Interrupt taken 2+int U 2+int 
Interrupt not taken 3 2 


(continued) 
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80386/80486 Instruction Set Clock Summary. continued 


Instruction 


11001101 
01100010 


INT = Interrupt n 
BOUND = Interrupt if out of range 
Interrupt taken 
Interrupt not taken 
IRET = Interrupt return 
Real mode/V86 mode 
Protected mode, same privilege 


Protected mode, different privilege 


Protected mode, nested task 


Basic interrupt times (INT) 
Real mode 
Protected mode via gate, same privilege 
Protected mode via gate, different privilege 


Protected mode via task gate 


V86 mode via gate 


V86 mode via task gate 


Basic task switch time (task switch) 
To 286 TSS 
To 386/486 TSS 
To V86 TSS 


Processor Control 


HLT = Halt 11110100 


MOV = Move to/from control or debug register 


00001111 00100010 


00001111 00100010 
00001111 00100000 


Register to CRO 
Register to CR2-3 


CRx to register 


80486 | Miss | Cache | 80386 | Ge 
Clocks |Penalty| Notes | Clocks | N 


U 


24+int 


7 


15 
20 
36 


32+task 
switch 


26 
44 
71 


37+task 
switch 


82 


37+task 
switch 


143 
162 
140 


11+int 


10 


pa 
38 
82 


16+task 
switch 


33 
59 
99 


50+task 
switch 


50+task 
switch 


232-237 
259-266 
178 


neral 
otes 
b,e 


b,e 


g,hj,k,r 
g,hj,k,r 


Sev 


Instruction — . Format 


DRO-3 to register 00001111 00100001 lleeereg 
DR6-7 to register 00001111 - 00100001 lleeereg 


80486 | Miss 
Clocks \Penalty 


Cache | 80386 | General 
Notes | Clocks Notes 
l 


Register to DRO-3 ; 00001111 00100011 

Register to DR6-7 00001111 00100011 

TRx to register 00001111 00100100 

Register to TRx 00001111 00100110 
CLTS = Clear task switched bit 00001111 00000110 


INVD = Invalidate cache 00001111 00001000 
WBINVLD = Write back and invalidate cache 00001111 00001001 
INVLPG = Invalidate TLB entry 00001111 00000001 mod 111 r/m 


NOP = No operation 10010000 


Prefix Bytes 
ADRSIZ = Address size override 01100111 
OPSIz = Operand size override 01100110 
LOCK = Bus lock 11110000 
CS = Code segment override 00101110 
DS = Data segment override 00111110 
ES = Extra segment override 00100110 
FS = FS segment override 01100100 


GS = GS segment override 01100101 


00110110 


SS = Stack segment override 


oOo o0UlCODTOlUmUlCUCOClUlCOlCOT COT CD 


Protection Control 


ARPL = Adjust requested privilege level 


From register 9 
From memory 9 
LAR = Load access rights 
From register 11 
From memory 11 a,g,h,j,p 


(continued) 
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80486 | Miss | Cache | 80386 | General 
Instruction Clocks |\Penalty| Notes | Clocks | Notes 
LGDT = Load GDT register 00001111 mod 010 r/m 5 
LIDT = Load IDT register 00001111 mod O11 r/m 5 
LLDT = Load LDT register 00001111 mod 010 r/m 


From register a) 


From memory 6 a,g,h,j,l 


TMSW'= tad niachine satuesord 00001111 00000001 mod 110 r/m 


From register b,c,h,] 


From memory 1 b,c,h,l 


LSL = Load segment limit 00001111 00000011 mod reg r/m 


From register 3 a,g,h,j,p 


From memory 6 a,g,h,j,p 
LTR =Load task register 00001111 00000000 mod 001 r/m 


From register a,g,h,j,l 


From memory 


a,g,hjj,l 
SGDT = Store GDT register oooo1111 mod 000 _r/m 
SIDT = Store IDT register Qoooili1 mod 001 n/m 
eee CSO sod 000-17 


b,c,h 


b,c,h 
To register a,h 
a,h 


To memory 


SMSW = Store machine status word 00001111 mod 100 _r/m 
To register 
Tomemory 

STR = Store task register ols mod’Sor ria 


To register a,h 


Tomemory a,h 


VERR = Verify read access 00001111 00000000 mod 100 r/m 


Register 3 a,g,h,j,p 


Memory 7 a,g,h,j,p 


Lev 


Clk Count 
Virtual 80486 | Miss 80386 | General 
Instruction 8086 Mode Clocks |\Penalty Clocks | Notes 
| VERW = Verify write access 


Register 
Memory 
1/O Instructions 


IN = Input from port 


Fixed port/ ee 26 
Variable port 27 
Real mode 
Protected mode (CPL <= IOPL) 
“Protect mode (CPL>IOPL) 
V86 mode 
OUT = Output to port 


Fixed port/ 1110011w 


Variable port 1110111w 


Real mode 


Protected mode (CP <=IOPL) 
Protect mode (CPL>IOPL) 
V86 mode 
INS = Input string 
Real mode 
Protected mode (CPL<=IOPL) 
Protect mode (CPL>IOPL) 
V86 mode | 
OUTS = Output string 
Real mode 
Protected mode (CPL<=IOPL) 
Protect mode (CPL>IOPL) 


V86 mode 


(continued) 
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Clk Count 
Virtual 
8086 Mode 


General 
Notes 


80486 | Miss | Cache | 80386 
Clocks |Penalty| Notes | Clocks 


Instruction 


Ser 


REP INS = Repeated input string 


Real mode 


Protected mode (CPL<=IOPL) 


Protect mode (CPL>IOPL) 


V86 mode 


REP OUTS = Repeated output string 


Real mode 


Protected mode (CPL<=IOPL) 


Protect mode (CPL>IOPL) 


V86 mode 


Cache Notes: 


A. 


B. 
C. 


I. 
J. 


K. 


» Assuming that the operand address and stack address fall in 


different cache sets. 

Always locked, no cache hit case. 

Clocks = 10 + max(log,( ml ),n) 
m = multiplier value (min clocks for m = 0) 
n = 3/5 for tm 


. Clocks = {quotient(count/operand length)}*7+9 


= 8 if count S$ operand length (8/16/32) 
Clocks = {quotient(count/operand length)}*7+9 

= 9 if count S$ operand length (8/16/32) 
Equal/not equal cases (penalty is the same regardless of lock). 


. Assuming that addresses for memory read (for indirection), stack 


push/pop, and branch fall in different cache sets. 


. Penalty for cache miss: add 6 clocks for every 16 bytes copied to 


new stack frame. 

Add 11 clocks for every unaccessed descriptor load. 

Refer to task switch clock counts table for value of TS. 

Add 4 extra clocks to the cache miss penalty for each 16 bytes. 


For notes L-M: (b = 0-3, nonzero byte number); 


Gi = 0-1, nonzero nibble number); 
(n = 0-3, nonbit number in nibble); 


L. 


M. 


11110010 0110110w 


11110010 0110111w 


Clocks = 8+4(b+1) + 3G+D + 3(n+1) 
= 6 if second operand = 0 

Clocks = 9+4(b+1) + 3G+D + 3(n+D 
= 7 if second operand = 0 


For notes N—O: (n = bit position 0-31) 


N. 


O. 


ron 


@<G 


Clocks = 7 + 3(32-n) 

6 if second operand = 0 

Clocks = 8 + 3(32-n) 

7 if second operand = 0 

Assuming that the two string addresses fall in different cache sets. 


. Cache miss penalty: add 6 clocks for every 16 bytes compared. Entire 


penalty on first compare. 

Cache miss penalty: add 2 clocks for every 16 bytes of data. Entire 

penalty on first load. 

Cache miss penalty: add 4 clocks for every 16 bytes moved. (1 clock for 

the first operation and 3 for the second) 

Cache miss penalty: add 4 clocks for every 16 bytes scanned. (2 clocks 
each for first and second operations) 

Refer to interrupt clock counts table for value of INT. 

Clock count includes one clock for using both displacement and immediate. 


. Refer to assumption 6 in the case of a cache miss. 


Appendix D: Instruction Format and Timing 


Instruction Encoding 


All instruction encodings are subsets of the general instruction format shown in 
Figure D-1. Instructions consist of one or two primary opcode bytes, possibly an ad- 
dress specifier consisting of the mod r/m byte and scaled index byte, a displace- 
ment if required, and an immediate data field if required. 


Within the primary opcode or opcodes, smaller encoding fields can be defined. 
These fields vary according to the class of operation. The fields define information 
such as direction of the operation, size of the displacements, register encoding, and 
sign extension. 


Almost all instructions that refer to an operand in memory have an addressing mode 
byte following the primary opcode byte(s). This byte, the mod r/m byte, specifies 
the address mode to be used. Certain encodings of the mod r/m byte indicate a 
second addressing byte, the scale-index-base oye, which fully specifies the 
addressing mode. 


Addressing modes can include a displacement immediately following the mod r/m 
byte or the scaled index byte. If a displacement is present, the possible sizes are 8, 
16, and 32 bits. 


If the instruction specifies an immediate operand, the immediate operand follows 
any displacement bytes. The immediate operand is always the last field of the 
instruction. 


Figure D-1 illustrates some of the fields that can appear in an instruction, such as 
the mod field and the r/m field. Several smaller fields also appear in certain instruc- 

_ tions, sometimes within the opcode bytes. The table on the following page is a 
complete list of all fields appearing in the 80386-family instruction set. Detailed 
tables for each field appear later in this appendix. 


ann — d32| 16] 8| none data32| 16] 8] none 


0765320 765320 


opcode “mod r/m” “s-i-b” address immediate 
(one or two bytes) byte byte displacement data 
(T represents an (4, 2, 1 bytes (4, 2, 1 bytes 
opcode bit) register and address or none) or none) 


mode specifier 


Figure D-1. General instruction format. 
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Fields within 80386 Instructions 


Field Number 

Name Description of Bits 

Ww Specifies whether data is byte size or full size (full size — 1 
is either 16 or 32 bits) 

d Specifies direction of data operation 1 

S Specifies whether an immediate data field must be 1 
sign-extended 

reg General register specifier 3 

mod r/m Address mode specifier (effective address can be a 2 for mod; 
general register) 3 for r/m 

SS Scale factor for scaled index address mode 2 

index General register to be used as index register 3 

base General register to be used as base register 3 

sreg2 Segment register specifier for CS, SS, DS, ES Z 

sreg3 Segment register specifier for CS, SS, DS, ES, FS, GS a 

cccc For conditional instructions, specifies a condition 4 


asserted or a condition negated 


NOTE: Figure D-1 shows encoding of individual instructions. 


32-bit extensions of the instruction set 


With the 80386, the 8086/80186/80286 instruction set is extended in two orthogonal 
directions: 32-bit forms of all 16-bit instructions support the 32-bit data types, and 
32-bit addressing modes are available for all instructions referring to memory. This 
orthogonal instruction set extension is accomplished by having a default (D) bit in 
the code segment descriptor and by having two prefixes to the instruction set. 


Whether the instruction defaults to operations of 16 bits or 32 bits depends on the 
setting of the D bit in the code segment descriptor. The D bit specifies the default 
length (either 16 bits or 32 bits) for both operands and effective addresses when 
executing that code segment. Real address mode and virtual 8086 mode use no 
code segment descriptors, but the 80386 internally assumes a D value of 0 when 
operating in those modes (for 16-bit default sizes compatible with the 
8086/80186/80286). 


Two prefixes, the operand size prefix and the effective address size prefix, allow 
overriding the default selection of operand size and effective address size. These 
prefixes can precede any opcode bytes and affect only the instruction they pre- 
cede. If necessary, one or both prefixes can be placed before the opcode bytes. The 
presence of the operand size prefix and the effective address size prefix toggles the 
operand size or the effective address size to the value opposite that of the default 
setting. For example, if the default operand size is for 32-bit data operations, the 
presence of the operand size prefix toggles the instruction to 16-bit data operations. 
If the default effective address size is 16 bits, the presence of the effective address 
size prefix toggles the instruction to use 32-bit effective address computations. 
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These 32-bit extensions are available in all 80386/80486 modes, including real ad- 
dress mode or virtual 8086 mode. In these two modes the default is always 16 bits, 
so prefixes are needed to specify 32-bit operands or addresses. 


Unless specified, instructions with 8-bit and 16-bit operands do not affect the con- 
tents of the high-order bits of the extended registers. 


Encoding of instruction fields 


Several fields indicate register selection, addressing mode, and so on within the 
instruction. The encodings of these fields are defined in the following tables. 


Encoding of the operand length (w) field 

For any given instruction performing a data operation, the instruction executes as 
a 32-bit operation or a 16-bit operation. Within the constraints of the operation size, 
the w field encodes the operand size as either 1 byte or the full operation size, as 
shown in the table below. 


Operand Length Encoding 
Operand Size During Operand Size During 
w Field 16-Bit Data Operations 32-Bit Data Operations 
©) 8 bits 8 bits 
1 16 bits 32 bits 


Encoding of the general register (reg) field 

The general register is specified by the reg field, which can appear in the primary 
opcode bytes or as the reg field of the mod r/m byte, or as the r/m field of the mod 
r/m byte. The following tables illustrate reg field encoding. 


Encoding of reg Field When w Field Is Not Present in Instruction 


Register Selected During Register Selected During 
reg Field 16-Bit Data Operations 32-Bit Data Operations 
000 AX EAX 

001 CX — ECX 
010 DX EDX 
011 BX EBX 
100 SP ESP 
101 BP | EBP 
101 SI ESI 
101 DI EDI 
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Encoding of reg Field When w Field Is Present in Instruction 


Register Specified by reg Field During 16-Bit Data Operations 


Function of w Field Function of w Field 
reg Field When w =0 When w= 1 
000 AL AX 
001 CL CX 
010 DL DX 
011 BL BX 
100 AH SP 
101 CH BP 
110 DH SI 
111 BH DI 


Encoding of reg Field When w Field Is Present in Instruction 


Register Specified by reg Field During 32-Bit Data Operations 


Function of w Field Function of w Field 
reg Field : When w =0 When w = 1 
000 AL EAX 
001 CL ECX 
010 DL EDX 
011 BL EBX 
100 AH ESP 
101 CH EBP 
110 DH ESI 
111 BH EDI 


Encoding of the segment register (sreg) field 

The sreg field in certain instructions is a 2-bit field that allows one of the four 80286 
segment registers to be specified. The sreg field in other instructions is a 3-bit field 
that allows the FS and GS segment registers to be specified. The following two 
tables show the selected segment registers. 


2-Bit sreg2 Field 


2-Bit sreg2 Field Segment Register Selected 
00 ES 

01 CS 

10 SS 

11 DS 
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3-Bit sreg3 Field 


3-Bit sreg3 Field Segment Register Selected 
000 ES 

001 CS 

010 SS 

011 DS 

100 FS 

101 GS 

110 Do not use 

111 Do not use 


Encoding of address mode 

Except for special instructions such as PUSH and POP, where the addressing mode 
is predetermined, the addressing mode for the current instruction is specified by 
addressing bytes following the primary opcode. The primary addressing byte is the 
mod r/m byte, and a second byte of addressing information, the s-i-b (scale-index- 
base) byte, can be specified. 


The s-i-b byte is specified when using 32-bit addressing mode and the mod r/m 
byte has r/m = 100 and mod = 00, 01, or 10. When the s-i-b byte is present, the 32-bit 
addressing mode is a function of the mod, ss, index, and base fields. 


The primary addressing byte, the mod r/m byte, also contains 3 bits (shown as TTT 
in Figure D-1) sometimes used as an extension of the primary opcode. The 3 bits, 
however, can also be used as a register field (reg). 


When calculating an effective address, either 16-bit addressing or 32-bit addressing 
is used. To calculate the effective address, 16-bit addressing uses 16-bit address com- 
ponents, whereas 32-bit addressing uses 32-bit address components. When 16-bit 
addressing is used, the mod r/m byte is interpreted as a 16-bit addressing mode 
specifier. When 32-bit addressing is used, the mod r/m byte is interpreted as a 32- 
bit addressing mode specifier. 


The following tables define all encodings of all 16-bit addressing modes and 32-bit 
addressing modes. 
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mod r/m Effective Address 
00 000 DS:[EAX] 
00 001 DS:[ECX] 
00 010 DS:[EDX] 
00 011 DS:[EBX] 
00 100 s-i-b is present 
00 101 DS:d32 
00 110 DS:[ESI] 
00 111 DS:[EDI] 

DS:[EAX+d8] 
01 000 
01 001 DS:[ECX+d8] 
01 010 DS:[EDX+d8] 
01 011 DS:[EBX+d8] 
01 100 s-i-b is present 
01 101 SS:[EBP+d8] 
01 110 DS:[ESI+d8] 
01 111 DS:[EDI+d8] 
10 000 DS:{EAX+d32] 
10 001 DS:[ECX+d32] 
10 010 DS:[EDX+d32] 
10 011 DS:[EBX+d32] 
10 100 s-i-b is present 
10 101 SS:[EBP+d32] 
10 110 DS:([ESI+d32] 
10 111 DS:[EDI+d32] 
11 000 register—see below 
11 001 register—see below 
11 010 register—see below 
11011 register—see below 
11 100 register—see below 
11101 register—see below 
11110 register—see below 
11111 register—see below 

Register Specified by reg or r/m During 16-Bit Data Operations 
Function of w Field Function of w Field 

mod r/m When w =0 When w = 1 
11 000 AL AX 
11 001 CL CX 
11 010 DL DX 
11 011 BL BX 
11 100 AH SP 
11 101 CH BP 
11 110 DH SI 
11 111 BH DI 
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Encoding of 32-Bit Address Mode with mod r/m Byte 


(no s-i-b Byte Present) 
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Encoding of 32-Bit Address Mode with mod r/m Byte 


(no s-i-b Byte Present) 


mod r/m 


11 000 
11 001 
11 010 
11 011 
11 100 
11 101 
11 110 
11 111 


Register Specified by reg or r/m During 32-Bit Data Operations 


Function of w Field Function of w Field 
When w =0 When w = 1 

AL EAX 

CL ECX 

DL EDX 

BL EBX 

AH ESP 

CH EBP 

DH ESI 

BH EDI 


Encoding of 32-Bit Address Mode (mod r/m Byte and s-i-b Byte Present) 


Mod Base 


00 000 
00 001 
00 010 
00 011 
00 100 
00 101 
00 110 
00 111 


01 000 
01 001 
. 01 010 
O01 O11 
01 100 
01 101 
01 110 
01111 


10 000 


10001 | 


10 010 
10 011 
10 100 
10 101 
10 110 
10 111 


Effective Address 


DS:[EAX+(scaled index)] 
DS:[ECX+(scaled index)] 
DS:[EDX+(scaled index)] 
DS:[EBX+(scaled index)] 
DS:[ESP+(scaled index)] 
DS:[d32+(scaled index)] 
DS:[ESI+(scaled index)] 
DS:IEDI+(scaled index)] 


DS:[EAX+(scaled index)+d8] 
DS:[ECX+(scaled index)+d8] 
DS:[EDX+(scaled index)+d8] 
DS:[EBX+(scaled index)+d8] 
SS:[ESP+(scaled index)+d8] 
SS:[EBP+(scaled index)+d8] 
DS:[ESI+(scaled index)+d8] 
DS:[EDI+(scaled index)+d8] 


DS:[EAX+(scaled index)+d32] 
DS:[ECX+(scaled index)+d32] 
DS:[EDX+(scaled index)+d32] — 
DS:[EBX+(scaled index)+d32] 
SS:[ESP+(scaled index)+d32] 
SS:[EBP+(scaled index)+d32] 
DS:[ESI+(scaled index)+d32] 
DS:[EDI+(scaled index)+d32] 


. NOTE: Mod field in mod r/m byte; ss, index, base fields in s-i-b byte. 


Scale Factor 


x1 
Zz 
x4 
x8 
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Index 


000 
001 
010 
011 
100 
101 
110 
111 


Index Register 


EAX 

ECX 

EDX 

EBX 

no index reg* 
EBP 

ESI 

EDI 


* When index field is 100, indicating no index register, ss field must equal 00. If index is 100 and ss does 
not equal 00, the effective address is undefined. 


Encoding of 16-Bit Address Mode with mod r/m Byte 


mod r/m 


00 000 
00 001 
00 010 


00 011. 


00 100 
00 101 
00 110 
00111 


01 000 
01 001 
01 010 
01 011 
01 100 
01 101 
01 110 
01 111 


10 000 
10 001 
10 010 
10 011 
10 100 
10 101 
10 110 
10 111 


11 000 
11 001 
11 010 
11011 
11 100 
11101 
11 110 
11111 
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Effective Address 


DS:[BX+SIJ 
DS:[BX+DI] 
SS:[BP+S]] 
SS:[BP+DI] 
DS:(SI] 

DS:[DU 
DS:[d16] 
DS:[BX] 
DS:[BX+SI+d8] 


DS:[BX+DI+d8] 
SS:[BP+SI+d8] 
SS:[BP+DI+d8] 
DS:[SI+d8] 
DS:[DI+d8] 
SS:[BP+d8] 
DS:[BX+d8] 


DS:[BX+SI+d16] 
DS:[BX+DI+d16] 
SS:[BP+SI+d16] 
SS:[BP+DI+d16] 
DS:[SI+d16] 
DS:[DI+d16] 
SS:[BP+d16] 
DS:[BX+d16] 


register—see page 447 
register—see page 447 
register—see page 447 
register—see page 447 
register—see page 447 
register—see page 447 
register—see page 447 
register—see page 447 


Appendix D: Instruction Format and Timing 


Encoding of 16-Bit Address Mode with mod r/m Byte 
Register Specified by r/m During 16-Bit Data Operations 


Function of w Field Function of w Field 

mod r/m When w = 0 When w = 1 

11 000 AL AX 

11 001 CL CX 

11 010 DL DX 

11011 BL BX 

11 100 AH SP 

11 101 CH BP 

11 110 DH SI 

11111 BH DI 


Encoding of 16-Bit Address Mode with mod r/m Byte 
Register Specified by r/m During 32-Bit Data Operations 


Function of w Field Function of w Field 

mod r/m When w=0 When w= 1 

11 000 AL EAX 

11 001 CL ECX 

11 010 DL EDX 

11 011 BL EBX 

11 100 AH ESP 

11 101 CH EBP 

11110 DH ESI 

11111 BH EDI 


Encoding of operation direction (d) field 
In many 2-operand instructions, the d field indicates which operand is the source 
and which is the destination, as shown in the following table. 


Operation Direction Encoding 


a Direction of Operation 


Register/Memory < Register 
reg field indicates source operand; mod r/m or mod ss index base 
indicates destination operand 
1 Register <— Register/Memory 
reg field indicates destination operand; mod r/m or mod ss index 
base indicates source operand | 


Encoding of sign extend (s) field 

The s field occurs in instructions with immediate data fields. The s field has an 
effect only if the size of the immediate data is 8 bits and is being placed in a 16-bit 
or 32-bit destination. The table on the following page shows s field encoding. 
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Sign Extend Encoding 
Ss Effect on Immediate Data 8 Effect on Immediate Data 16/32 
0 None None 
1 Sign extend data 8 to fill None 


16-bit or 32-bit destination 


_Encoding of conditional test (cccc) field 

For the conditional instructions (conditional jumps and set on condition), cccc is 
encoded with the condition to test. The following table shows encoding of the 
cecc field. 


Conditional Test Encoding 


Mnemonic Condition cccc 
O Overflow 0000 
NO No overflow 0001 
B/NAE Below/not above or equal 0010 
NB/AE Not below/above or equal 0011 
E/Z Equal/zero 0100 
NE/NZ Not equal/not zero 0101 
BE/NA Below or equal/not above 0110 
NBE/A Not below or equal/above 0111 
S Sign 1000 
NS Not sign 1001 
P/PE Parity/parity even 1010 
NP/PO Not parity/parity odd 1011 
L/NGE Less than/not greater or equal 1100 
NL/GE Not less than/greater or equal 1101 
LE/NG Less than or equal/greater than 1110 
NLE/G Not less or equal/greater than 1111 


Encoding of control, debug, and test registers (eee) field 
The eee field loads and stores the control, debug, and test registers. 


Encoding of eee When Interpreted as Control Register Field 


eee Code | Reg Name 
000 CRO 
010 CR2 
O11 CR3 


Do not use any other encoding. 
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Encoding of eee When Interpreted as Debug Register Field 


eee Code 


000 
001 
010 
O11 
110 
111 


Reg Name 


Do not use any other encoding. 


Encoding of eee When Interpreted as Test Register Field 


eee Code 


O11 
100 
101 
110 
111 


Reg Name 


TR3 
TR4 
TR5 
TRO 
TR7 


Do not use any other encoding. 


Floating-Point Extensions 


The table beginning on the following page shows NDP extensions to the basic 
instruction set. In the 80486, these instructions are implemented on-chip. An 80387 
is required to implement these instructions on 80386-based systems. 
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OSV 


Instruction Encoding/Timing 


| woe ha 
INSTRUCTION 80387 80486 


Byte Byte Optional 32-Bit | 32-Bit | 64-Bit 16-Bit | 32-Bit | 32-Bit | 64-Bit | 16-Bit 
0 1 Bytes 2-6 Real Integer Real Integer Real Integer Real Integer 


Data Transfer 
FLD = Load? 
Integer/real memory to ST(O) 3(@2) 3 (3) . 3Q3) 13-16 (2) 
Long integer memory to ST(O) 10-18 (3) 
Extended real memory to ST(0) 6(4) 
BCD memory to ST(0) ESC_111 70-103 (4) 
ST(@) to ST(O) ESC 001 4 
FST = Store 
ST(0) to integer/real memory ESC MF 1 SIB/DISP 
ST() to STG) ESC_101 
FSTP = Store and pop 
ST(0) to integer/real memory ESC MF 1 28-34 
ST(0) to long integer memory ESC 111 29-34 
ST(0) to extended ai ESC 011 53 6 
ST(O) to BCD memory ESC 111 SIB/DISP 512-534 172-176 
STO) toST@ ESC_101 12 3 
FXCH = Exchange 


ST@ and ST(0) ESC 001 


Comparison 
FCOM = Compare 


Integer/real memory to ST(O) 4(2) 15-17(2) 4(3) 16-20(2) 
ST@ to ST) 


FCOMP = Compare and pop 
Integer/real memory to ST ESC MF 0 4(2) 15-17 (2) 4(3) 16-20 (2) 


ST() to STO) 4 


| 


LSP 


| aii ean Cane 
- INSTRUCTION ; 0587 2 
Byte Byte Optional 32-Bit 32-Bit 64-Bit 16-Bit 32-Bit 32-Bit 64-Bit 16-Bit 
0 1 Bytes 2—6 Real Integer Real Integer Real Integer Real Integer 


FCOMPP = Compare and pop twice 


STG to STO) 


FTST = Test ST(O) ESC 001 11100100 
FUCOM = Unordered compare 


FUCOMP = Unordered compare | ESC 101 11101 ST(i) 
and pop 


FUCOMPP = Unordered 

compare and pop twice 
FXAM = Examine ST(0) 
Constants 

FLDZ = Load + 0.0 into ST) 

FLD 1 = Load + 1.0 into ST) 

FLDP 1 = Load pi into ST(O) 

FLDL2T = Load log,(10) into ST(O) 

FLDL2E = Load log,(e) into ST(0) 11101010 
FLDLG2 = Load log, )(2) into ST) 

FLDLN2 = Load log,(2) 11101101 


Arithmetic 


FADD = Add 


Integer/real memory with ST() 24-32 57-72 29-37 71-85 8-20(2) =: 19-32(2) 82003) ~— 20-352) 
ST(i) and STO) | 23-31» 8-20 
FSUB = Subtract 2 


Integer/real memory with ST(0) 24-32 57-82 28-36 71-83¢ 82072) 18-32(2) = 8-203) ~—s 20-352) 
ST(@ and ST) 26-344 8-20 


(continued) 
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Instruction Encoding/Timing. continued 


aie — 
INSTRUCTION 80387 80486 


Byte Byte Optional 32-Bit 32-Bit 64-Bit 16-Bit 32-Bit 32-Bit 64-Bit 6-Bit 
0 1 Bytes 2—6 Real Integer Real Integer Real Integer Real Integer 
FMUL = Multiply 


Integer/real memory with ST(0) ESC_MF 0 27-35 61-82 32-57 11(2) —-22-24(2) ss «14(3)—s-23--27(2) 
STG and STO) . ESC d P 0 11001R/M 29-57¢ 16 
FDIV = Divide 


Integer/real memory with ST(O) ESC MF 0 MOD _11R_R/M 89 120—127f 94 136-1408 73) 84-86(2) 73) 85-89(2) 


ST@MandSTO)  . ESC d P 0 1111 R R/M 88h 


FSQRT! = Square root ESC 001 11111010 122~129 


FSCALE = Scale ST(O) by STG) ESC 001 11111101 67-86 


FPREM = Partial remainder ESC 001 11111000 74-155 


FPREM1 = Partial remainder (IEEE) ESC 001 _ | 11110101 95-185 
11111100 66-80 


FRNDINT = Round ST(0) to integer ESC 001 


FXTRACT = Extract components ESC 001 11110100 70-76 
of ST(O) 


FABS = Absolute value of ST(O) ESC 001 11100001 22 
FCHS = Change sign of ST(O) 11100000 


Transcendental 


FCOSK = Cosine of ST() ESC_001 11111111 123-772! | 193-279 
FPTANK = Partial tangent of ST(O) ESC 001 1110010 . 191-4973 200-273 
FPATAN = Partial arctangent ESC 001 11110011 314—487 218-303 
FSINK = Sine of ST(O) ESC_001 11111110 122-7711 193-279 


11111011 194—809i 243-392 


FSINCOSk = Sine and cosine of ST(O) ESC 001 


F2XM1! = 2ST) —] ESC 001 11110000 211—476 140-279 


FYL2X™ = ST(1)+ log,(ST(O)) iE 11110001 120-538 196-329 
11111001 


FYL2XP1" =ST(1) . ESC 001 


ESV 


| ina | suey: sai 
ENCODING er ee 
INSTRUCTION caudal 
Byte Byte Optional 32-Bit 32-Bit 64-Bit 16-Bit 32-Bit 32-Bit 64-Bit 
0 1 | Bytes 2—6 Real Integer Real Integer Real Integer Real 


Processor Control 
FINIT = Initialize NPX 33 
FSTSW AX = Store status word 13 
FLDCW = Load control word ESC 001 19 
FSTCW = Store control word 15 
FSTSW = Store status word 15 
FCLEX = Clear exceptions 11 
FSTENV = Store environment 103-104 56-67 
FLDENV = Load environment 71 34-44 
FSAVE = Save state 375-376 143-154 
FRSTOR = Restore state 308 120-131 (23-27) 
FINCSTP = Increment stack pointer Zt 3 
FDECSTP = Decrement stack pointer | 22 3 
FFREE = Free ST(i) 18 3 
FNOP = No operation 12 ) 
NOTES: 
a. When loading single-precision or double-precision 0 from memory, i. -O0 SST) S< +0, 
add 5 clocks. j. These timings hold for operands in the range lx | <2/4. For operands not in 
b. Add 3 clocks to the range when d = 1. this range, up to 76 additional clocks might be needed to reduce the operand. 
c. Add 1 clock to each range when R = 1. k.0<| STC@)| < 26, 
d. Add 3 clocks to the range whend=0.  - ]. -1.0 < STC) < 1.0. 
e. Typical = 52. (When d = 0, 46-54, typical = 49.) m. 0 < ST(O) < 00, -c0 < ST(1) < + ©, 
f. Add 1 clock to the range when R = 1. n. 0<|ST()| <(2—SQRT(2))/2, © < ST) < + &. 


g. 135-141 when R = 1. 
h. Add 3 clocks to the range when d = 1. 


Appendix E 


INSTRUCTION 
DISASSEMBLY 
TABLE 


The table in this appendix allows you to decode 80386 instructions. It presents the 
same information as the opcode table in Appendix C but is easier to use. 


The table has the following format: 
[required byte(s)] loperand byte(s)] [instruction] 


At least one of the required bytes is an 8-bit hexadecimal value, and additional 
_bytes may follow. The operand bytes have one of the following forms: 


ea The source and destination operands are encoded in the standard mod reg r/m 
format described in Appendix D. 


ea/N The destination operand is encoded in the mod r/m portion of the ea field, | 
and the reg bits are set to /N. ‘ 


dataN N bytes of immediate data follow the instruction. 


—/n/reg The standard mod reg r/m encoding is interpreted so that the mod bits 
are ignored, the reg bits specify register n of a group (such as CR3), and the r/m bits 
select a general 32-bit register. 


dispN A signed displacement (N bits in length) from the current instruction 
pointer (EIP) follows the instruction. 


The abbreviations Ea, Eb, Ew, and Ed stand for the effective address, byte, word, 
and doubleword indicated by the ea bits in the instruction. 


Instructions preceded by an asterisk () are 32-bit instructions that operate on 16-bit 
quantities when preceded with the OPSIZ: instruction prefix. For real mode, V86 
mode, and 286-compatible code segments, the behavior is reversed; that is, the in- 
structions operate on 16-bit operands unless preceded with the OPSIZ: prefix. 
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Instruction Disassembly Table 


Instruction 
Bytes 


00 ea 
+*Olea 
02 ea 
+03 ea 
04 data8 
#05 data32 
«06 
+07 
08 ea 
*09 ea 
OA ea 
*0B ea 
OC data8 
«OD data32 
«OE 
OF 00 ea/0 
OF 00 ea/1 
OF 00 ea/2 
OF 00 ea/3 
OF 00 ea/4 
OF 00 ea/5 
OF 01 ea/0 
OF 01 ea/1 
OF 01 ea/2 
OF 01 ea/3 
OF 01 ea/4 
OF 01 ea/6 
+OF 02 ea 
+*OF 03 ea 
OF 06 
OF 08 
OF 09 
OF 10 ea 
OF 20 —/n/reg 
OF 21 —/n/reg 
OF 22 —/n/reg 
OF 23 —/n/reg 
OF 24 —/n/reg 
OF 26 —/n/reg 
+OF 80 disp32 
*OF 81 disp32 
«OF 82 disp32 
«OF 83 disp32 
*OF 84 disp32 
«OF 85 disp32 
+OF 86 disp32 
«OF 87 disp32 
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Operation 


ADD Eb, reg8 

ADD Ed, reg32 

ADD reg8, Eb 

ADD reg32, Ed 

ADD AL, data8 

ADD EAX, data32 

PUSH ES 

POP ES 

OR Eb, reg8 

OR Ed, reg32 

OR reg8, Eb 

OR reg32, Ed 

OR AL, data8 

OR EAX, data32 

PUSH CS 

SLDT Ew 

STR Ew 

LLDT Ew 

LTR Ew 

VERR Ew 

VERW Ew 

SGDT Ea 

SIDT Ea 

LGDT Ea 

LIDT Ea 

SMSW Ew 

LMSW Ew 

LAR reg32, Ew 

LSL reg32, Ew 

CLTS 

INVD 

WBINVD 

INVLPG, ea 

MOV CRn, reg32 

MOV DRn, reg32 

MOV reg32, CRn 

MOV reg32, DRn 

MOV TRa, reg32 

MOV reg32, TRn 

JO disp32 

JNO disp32 

JB disp32 (JB/JNAE) 

JNB disp32 (JNB/JAE) 

JZ disp32 (JZ/JED 

JNZ disp32 (JNZ/JNE) 

JBE disp32 (JBE/JNA) 

JNBE disp32 (JNBE/ 
JA) 


Instruction 
Bytes 


«OF 88 disp32 
«OF 89 disp32 
«OF 8A disp32 
*OF 8B disp32 
*OF 8C disp32 
*OF 8D disp32 
*OF 8E disp32 
«OF 8F disp32 

OF 90 ea 

OF 91 ea 

OF 92 ea 


OF 93 ea 


OF 94 ea 
OF 95 ea 
OF 96 ea 
OF 97 ea 
OF 98 ea 
OF 99 ea 
OF 9A ea 
OF 9B ea 
OF 9C ea 
OF 9D ea 
OF 9E ea 
OF 9F ea 
*OF AO 
#OF Al 
*OF A3 ea 


*OF A4 ea data8 


*OF A5 ea 
OF AG 
OF A7 
+*OF A8 
*OF AQ 
+*OF AB ea 


«OF AC ea data8 


*OF AD ea 
+*OF AF ea 
*OF B2 ea 
+*OF B3 ea 
«OF B4 ea 
*OF B5 ea 
«OF B6 ea 
+*OF B7 ea 


Operation 


JS disp32 


JNS disp32 


JP disp32 (JP/JPE) 

JNP disp32 (JNP/JPO) 

JL disp32 (JL/JNGE) 

JNL disp32 (JNL/JGE) 

JLE disp32 (JLE/JNG) 

JNLE disp32 (JNLE/JG) 

SETO Eb 

SETNO Eb 

SETB Eb (SETB/SETNAE/ 
SETC) 

SETNB Eb (SETNB/SETAE/ 
SETNC) 

SETZ Eb (SETZ/SETE) 

SETNZ Eb (SETNZ/SETNE) 

SETBE Eb (SETBE/SETNA) 

SETNBE Eb (SETNBE/SETA) 

SETS Eb 

SETNS Eb 

SETP Eb (SETP/SETPE) 

SETNP Eb (SETNP/SETPO) 

SETL Eb (SETL/SETNGE) 

SETNL Eb (SETNL/SETGE) 

SETLE Eb (SETLE/SETNG) 

SETNLE Eb (SETNLE/SETG) 

PUSH FS 

POP FS 

BT Ed, reg32 

SHLD Ed, reg32, data8 

SHLD Ed, reg32, CL 

CMPXCHG Eb, reg8 

CMPXCHG Ed, reg32 

PUSH GS 

POP GS 

BTS Ed, reg32 

SHRD Ed, reg32, data8 

SHRD Ed, reg32, CL 

IMUL reg32, Ed 

LSS reg32, Ea 

BTR Ed, reg32 

LFS reg32, Ea 

LGS reg32, Ea 

MOVZX reg32, Eb 

MOVZX reg32, Ew 


«OF BA ea/4 data8__ BT Ed, data8 
«OF BA ea/5 data8 BTS Ed, data8 
«OF BA ea/6 data8 BTR Ed, data8 
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Instruction Disassembly Table. continued 


Instruction 
Bytes 


«OF BA ea/7 data8 
*OF BB ea 
*OF BC ea 
*OF BD ea 
*OF BE ea 
*OF BF ea 

OF CO 

OF Cl 

OF C8 

OF C9 

OF CA | 

OF CB 

OF CC 

OF CD 

OF CE 

OF CF 

10 ea 
#llea 

12 ea 
#13 ea 

14 data8 
«15 data32 
«16 

18 ea 
#19 ea 

1Aea 
*1Bea 

1C data8 
*1D data32 
#1E 
#1F 

20 ea 
#21lea 

22ea 
*23 ea 

24 data8 
«25 data32 

26 

27 

28 ea 
#29 ea 

2A ea 
*2Bea 

2C data8 
«2D data32 

2E 

2F 


Operation 


BTC Ed, data8 
BTC Ed, reg32 
BSF reg32, Ed 
BSR reg32, Ed 
MOVSX reg32, Eb 
MOVSX reg32, Ew 
XADD Eb, reg8 
XADD Eb, reg32 
BSWAP EAX 
BSWAP ECX 
BSWAP EDX 
BSWAP EBX 
BSWAP ESP 
BSWAP EBP 
BSWAP ESI 
BSWAP EDI 

ADC Eb, reg8 
ADC Ed, reg32 
ADC reg8, Eb 
ADC reg32, Ed 
ADC AL, data8 
ADC EAX, data32 
PUSH SS 

POP SS 

SBB Eb, reg8 
SBB Ed, reg32 
SBB reg8, Eb 
SBB reg32, Ed 
SBB AL, data8 
SBB EAX, data32 
PUSH DS 

POP DS 

AND Eb, reg8 
AND Ed, reg32 
AND reg8, Eb 
AND reg32, Ed 
AND AL, data8 
AND EAX, data32 
ES: | 
DAA 

SUB Eb, reg8 
SUB Ed, reg32 
SUB reg8, Eb 
SUB reg32, Ed 
SUB AL, data8 
SUB EAX, data32 
CS: 

DAS 


Instruction 


Bytes 


30 ea 
#31 ea 

42 ea 
«33 ea 

34 data8 
«35 data32 

36 

3/7 

38 ea 
«39 ea 

3A ea 
+3Bea 

3C data8 
*3D data32 

3E 

3F 
+40 
#4] 
+42 
+43 
+44 
«46 
«47 
+48 
+49 
+4 
*4B 
+4C 
*4D 
*4E 
+4F 
«50 
+51 
#52 
#53 
#54 
#55 
#56 
«57 
«59 
5A 
+5B 
*5C 
*5D 
#5E 


meas 


Operation 


XOR Eb, reg8 
XOR Ed, reg32 
XOR reg8, Eb 
XOR reg32, Ed 
XOR AL, data8 
XOR EAX, data32 
SS: 

AAA 

CMP Eb, reg8 
CMP Ed, reg32 
CMP reg8, Eb 
CMP reg32, Ed 
CMP AL, data8 
CMP EAX, data32 
DS: 

AAS 

INC EAX 

INC ECX 

INC EDX 

INC EBX 

INC ESP 

INC EBP 

INC ESI 

INC EDI 

DEC EAX 
DEC ECX 

DEC EDX 
DEC EBX 

DEC ESP 

DEC EBP 

DEC ESI 

DEC EDI 
PUSH EAX 
PUSH ECX 
PUSH EDX 
PUSH EBX 
PUSH ESP 
PUSH EBP 
PUSH ESI 
PUSH EDI 
POP EAX 
POP ECX 

POP EDX 
POP EBX 
POP ESP 

POP EBP 

POP ESI 

POP EDI 


(continued) 
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Instruction Disassembly Table. continued 


458 _ 


Instruction Instruction 
Bytes Operation Bytes Operation 
«60 PUSHAD +83 ea/0 data8 ADD Ed, data8 
«61 POPAD «83 ea/1 data8 OR Ed, data8 
«62 ea BOUND reg32, Ea «83 ea/2 data8 ADC Ed, data8 
63 ea ARPL Ew, reg16 *83 ea/3 data8 SBB Ed, data8 
64 FS: «83 ea/4 data8 AND Ed, data8 
65 GS: «83 ea/5 data8 SUB Ed, data8 
66 OPSIZ: «83 ea/6 data8 XOR Ed, data8 
67 ADRSIZ: «83 ea/7 data8 CMP Ed, data8 
«68 data 32 PUSH data32 84 ea TEST Eb, reg8 
«69 ea data32 IMUL reg32, Ed, data32*85 ea TEST Ed, reg32 
6A data8 PUSH data8 86 ea XCHG Eb, reg8 
«OB ea data8 IMUL reg32, Ed, data8 +87 ea XCHG Ed, reg32 
6C INSB 88 ea MOV Eb, reg8 
*6D INSD *89 ea MOV Ed, reg32 
6E OUTSB 8A ea MOV reg8, Eb 
*OF OUTSD «8B ea MOV reg32, Ed 
70 disp8 JO disp8 8C ea/s MOV Ew, sreg 
71 disp8 JNO disp8 *8D ea LEA reg32, Ea 
72 disp8 JB disp8 (JB/JNAE) 8E ea/s MOV sreg, Ew 
73 disp8 JNB disp8 (JNB/JAE) *8F ea POP Ed 
74 disp8 JZ disp8 (JZ/JE) 90 NOP 
75 disp8 JNZ disp8 (JNZ/JNE) +91 XCHG EAX, ECX 
76 disp8 JBE disp8 (JBE/JNA) +92 XCHG EAX, EDX 
77 disp8 JNBE disp8 (JNBE/JA) +93 XCHG EAX, EBX 
78 disp8 JS disp8 «94 XCHG EAX, ESP 
79 disp8 JNS disp8 +95 XCHG EAX, EBP 
7A disp8 JP disp8 (JP/JPE) 96 XCHG EAX, ESI 
7B disp8 JNP disp8 (JNP/JPO) +97 XCHG EAX, EDI 
7C disp8 JL disp8 (JL/JNGE) «98 CBW / CWDE 
7D disp8 JNL disp8 (JNL/JGE) 99 CWD 
7E disp8 JLE disp8 (JLE/JNG) 9A offset32 CALL offset32 
7F disp8 JNLE disp8 (JNLE/JG) 9B WAIT 
80 ea/0 data8 ADD Eb, data8 *9C PUSHFD 
80 ea/1 data8 OR Eb, data8 «9D POPFD 
80 ea/2 data8 ADC Eb, data8 OE SAHF 
80 ea/3 data8 SBB Eb, data8 OF LAHF 
80 ea/4 data8 AND Eb, data8 AO disp MOV AL, [disp] 
80 ea/5 data8 SUB Eb, data8 *A1 disp MOV EAX, [disp] 
80 ea/6 data8 XOR Eb, data8 A2 disp MOV [disp], AL 
80 ea/7 data8 CMP Eb, data8 *A3 disp MOV ldisp], EAX 
*81 ea/0 data32 ADD Ed, data32 A4 MOVSB | 
*81 ea/1 data32 OR Ed, data32 *A5 MOVSD 
+81 ea/2 data32 ADC Ed, data32 A6 CMPSB 
- *81 ea/3 data32 SBB Ed, data32 +*A7 CMPSD 
*81 ea/4 data32 AND Ed, data32 A8 data8 TEST AL, data8. 
*81 ea/5 data32 SUB Ed, data32 *A9 data32 TEST EAX, data32 
«81 ea/6 data32 XOR Ed, data32 AA STOSB 
+81 ea/7 data32 CMP Ed, data32 +*AB STOSD 
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Instruction Disassembly Table. continued 


Instruction Instruction 
Bytes Operation Bytes Operation 
AC LODSB DO ea/0 ROL Eb, 1 
+*AD LODSD DO ea/1 ROR Eb, 1 
AE SCASB DO ea/2 RCL Eb, 1 
*AF SCASD DO ea/3 RCR Eb, 1 
BO data8 MOV AL, data8 DO ea/4 SHL Eb, 1 
B1 data8 MOV CL, data8 DO ea/5 SHR Eb, 1 
B2 data8 MOV DL, data8 DO ea/7 SAR Eb, 1 
B3 data8 MOV BL, data8 «D1 ea/0 ROL Ed, 1 
B4 data8 MOV AH, data8 «D1 ea/1 ROR Ed, 1 
B5 data8 MOV CH, data8 «D1 ea/2 RCL Ed, 1 
B6 data8 MOV DH, data8 «D1 ea/3 RCR Ed, 1 
B7 data8 MOV BH, data8 «D1 ea/4 SHL Ed, 1 
«B8 data32 MOV EAX, data32 «D1 ea/5 SHR Ed, 1 
*B9 data32 MOV ECX, data32 «D1 ea/7 SAR Ed, 1 
*BA data32 MOV EDX, data32 D2 ea/0 ROL Eb, CL 
«BB data32 MOV EBX, data32 D2 ea/1 ROR Eb, CL 
*BC data32 MOV ESP, data32 D2 ea/2 RCL Eb, CL 
«BD data32 MOV EBP, data32 D2 ea/3 RCR Eb, CL 
«BE data32 MOV ESI, data32 D2 ea/4 SHL Eb, CL 
«BF data32 MOV EDI, data32 D2 ea/5 SHR Eb, CL 
CO ea/0 data8 ROL Eb, data8 D2 ea/7 SAR Eb, CL 
CO ea/1 data8 ROR Eb, data8 «D3 ea/0 ROL Ed, CL 
CO ea/2 data8 RCL Eb, data8 +*D3 ea/1 ROR Ed, CL 
CO ea/3 data8 RCR Eb, data8 «D3 ea/2 RCL Ed, CL 
C0.ea/4 data8 SHL Eb, data8 «D3 ea/3 RCR Ed, CL 
CO ea/5 data8 SHR Eb, data8 «D3 ea/4 SHL Ed, CL 
CO ea/7 data8 SAR Eb, data8 «D3 ea/5 SHR Ed, CL 
«C1 ea/0 data8 ROL Ed, data8 *«D3ea/7 © SAR Ed, CL 
*Cl1 ea/1 data8 ROR Ed, data8 D4 AAM 
*C1 ea/2 data8 RCL Ed, data8 D5 AAD 
«C1 ea/3 data8 RCR Ed, data8 D7 XLAT 
*C1 ea/4 data8 SHL Ed, data8 D8 ESC 0 (NDP) 
*Cl1 ea/5 data8 SHR Ed, data8 D9 ESC 1 CNDP) 
«C1 ea/7 data8 SAR Ed, data8 DA ESC 2 (NDP) 
C2 datal6 RET data16 DB ESC 3 (NDP) 
C3 RET DC ESC 4 (NDP) 
*C4ea LES reg32, Ed DD ESC 5 (NDP) 
*C5 ea LDS reg32, Ed DE ESC 6 (NDP) 
C6 ea data8 MOV reg8, data8 DF ESC 7 (NDP) 
«C7 ea data32 MOV reg32, data32 EO disp8 LOOPNE disp8 
C8 datal6 data8_ ENTER datal6, data8 (LOOPNE/LOOPNZ) 
C9 LEAVE E1 disp8 LOOPE disp8 
CA data16 RETF datal16 (LOOPE/LOOPZ) 
CB RETF E2 disp8 LOOP disp8 
CC INT 3 E3 disp8 JCXZ disp8 
CD data8 INT data8 E4 data8 IN AL, data8 
CE INTO *E5 data8 IN EAX, data8 
CF IRET E6 data8 QUT data8, AL 


(continued) 
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Instruction Disassembly Table. continued 


Instruction Instruction 
Bytes Operation Bytes Operation 
*E7 data8 OUT data8, EAX *F7 ea/2 NOT Ed 
*E8 ea32 CALL ea32 *F7 ea/3 NEG Ed 
EO disp32 JMP disp32 *F7 ea/4 MUL EAX, Ed 
+*EA ea48 JMP FAR ea48 *F7 ea/5 IMUL EAX, Ed 
EB disp8 JMP disp8 *F7 ea/6 DIV EAX, Ed 
EC IN AL, DX *F7 ea/7 IDIV EAX, Ed 
*ED IN EAX, DX F8 CLC 
EE OUT DX, AL F9 STC 
* EF OUT DX, EAX FA CLI 
FO LOCK FB STI 
F2 REPNE/REPNZ FC CLD 
F3 REP/REPE/REPZ FD STD 
F4 HLT FE ea/0 INC Eb 
F5 CMC FE ea/1 DEC Eb 
F6 ea/0 data8 TEST Eb, data8 «FF ea/0 INC Ed 
F6 ea/2 NOT Eb «FF ea/1 DEC Ed 
F6 ea/3 NEG Eb *FF ea/2 CALL Ed 
F6 ea/4 MUL AL, Eb *FF ea/3 CALL FAR ea 
F6 ea/5 IMUL AL, Eb FF ea/4 JMP Ed 
F6 ea/6 DIV AL, Eb *FF ea/5 JMP FAR ea 
F6 ea/7 IDIV AL, Eb «FF ea/6 PUSH Ed 
*F7 ea/0 data32 TEST Ed, data32 
80387/80486 -NDP Extensions (NDP Escapes) 
Instruction Instruction 
Bytes Operation Bytes Operation 
D8 ea/0 FADD Real32 D9 ea/4 FLDENV Ea 
D8 ea/1 FMUL Real32 D9 ea/5 FLDCW Ew 
D8 ea/2 FCOM Real32 D9 ea/6 FSTENV Ea 
D8 ea/3 FCOMP Real32 D9 ea/7 FSTCW Ew 
D8 ea/4 FSUB Real32 D9 CO+i FLD STG) 
D8 ea/5 FSUBR Real32 D9 C8+ti FXCH ST) 
D8 ea/6 FDIV Real32 D9 DO FNOP 
D8 ea/7 FDIVR Real32 D9 EO FCHS 
D8 CO+i FADD ST, STG) D9 El FABS 
D8 C8+ti FMUL ST, ST@) D9 E4 FTST 
D8 DO+ti FCOM, STQ) DI E5 FXAM 
D8 D8ti FCOMBP, ST(@ D9 E8& FLD1 
D8 EO+i FSUB ST, STG) D9 E9 FLDL2T 
D8 E8+i FSUBR ST, ST) D9 EA FLDL2E 
D8 FO+i FDIV ST, STG) D9 EB FLDPI 
D8 F8+i FDIVR ST, STG) D9 EC FLDLG2 
D9 ea/0 FLD Real32 D9 ED ~ FLDLN2 
D9 ea/2 FST Real32 D9 EE FLDZ 
D9 ea/3 FSTP Real32 D9 FO F2XM1 
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(continued) 


Appendix E: Instruction Disassembly Table 


80387/80486 -NDP Extensions (NDP Escapes). continued 


Instruction 


Bytes 


D9 Fl 
D9 F2 
D9 F3 
D9 F4 
D9 F5 
D9 F6 
D9 F7 
D9 F8 
D9 F9 
D9 FA 
D9 FB 
D9 FC 
D9 FD 
D9 FE 
D9 FF 
DA ea/0 
DA ea/1 
DA ea/2 
DA ea/3 
DA ea/4 
DA ea/5 
DA ea/6 
DA ea/7 
DA E9 
DB ea/0 
DB ea/2 
DB ea/3 
DB ea/5 
DB ea/7 
DB E2 
DB E3 
DC ea/0 
DC ea/1 
DC ea/2 
DC ea/3 
DC ea/4 
DC ea/5 
DC ea/6 
DC ea/7 
DC CO+ti 


Operation 


FYL2X 
FPTAN 
FPATAN 
FXTRACT 
FPREM1 
FDECSTP 
FINCSTP 
FPREM 
FYL2XP1 
FSQRT 
FSINCOS 
FRNDINT 
FSCALE 

FSIN. 

FCOS 

FIADD Int32 
FIMUL Int32 
FICOM Int32 
FICOMP Int32 
FISUB Int32 
FISUBR Int32 
FIDIV Int32 
FIDIVR Int32 
FUCOMPP 
FILD Int32 
FIST Int32 
FISTP Int32 
FLD Real80 
FSTP Real80 
FCLEX 

FINIT 

FADD Real64 
FMUL Real64 
FCOM Real64 
FCOMP Real64 
FSUB Real64 
FSUBR Real64 
FDIV Real64 
FDIVR Real64 
FADD ST(i), ST 


Instruction 
Bytes 


DC C8+i 
DC EO+i 
DC E8+i 
DC FO+i 
DC F8+1 
DD ea/0 
DD ea/2 
DD ea/3 
DD ea/4 


_DDea/6 


DD ea/7 
DD CO+i 
DD DO+i 
DD D8+i 
DD EO+i 
DD E8+i 
DE ea/0 
DE ea/1 
DE ea/2 
DE ea/3 
DE ea/4 
DE ea/5 
DE ea/6 
DE ea/7 
DE CO+i 
DE C8+i 
DE D9 
DE EO+i 
DE E8+i 
DE FO+i 
DE F8+i 
DF ea/0 
DF ea/2 
DF ea/3 
DF ea/4 
DF ea/5 
DF ea/6 
DF ea/7 
DF EO 


Operation 


FMUL STQ), ST 
FSUBR ST(i), ST 
FSUB STG), ST 
FDIVR ST(Q), ST 
FDIV STG), ST 
FLD Real64 

FST Real64 
FSTP Real64 
FRSTOR Ea 
FSAVE Ea 
FSTSW Ew 
FFREE STG) 
FST STG 

FSTP STQ) 
FUCOM STQ@) 
FUCOMP STQ) 
FIADD Int16 
FIMUL Int16 
FICOM Int16 
FICOMP Int16 
FISUB Int16 
FISUBR Int16 
FIDIV Int16 
FIDIVR Int16 
FADDP ST(i), ST 
FMULP STG), ST 
FCOMPP 
FSUBRP ST(i), ST 
FSUBP ST(i), ST 
FDIVRP STW), ST 
FDIVP ST(i), ST 
FILD Int16 

FIST Int16 
FISTP Int16 
FBLD Bcd80 
FILD Int64 
FBSTP Bcd80 
FISTP Int64 
FSTSW AX 
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Appendix F 


S0S6-FAMILY 
PROCESSOR 
DIFFERENCES 


Although the 8086, 80286, 80386, and 80486 are object-code compatible, minor dif- 
ferences among them have arisen during the evolution of this microprocessor 
family. This appendix describes these differences. 


Real-Mode Differences Between the 8086 and 
the 80386/80486 


The 8086 processor does not generate exceptions 6, 8-13, 16, or the 80486-unique 
exception 17. 


Instructions execute more rapidly. 


On the 80386/80486, the divide fault (NT 0) leaves the saved CS:EIP pointing to the 
faulting instruction. On the 8086, the value of CS:IP on the stack points to the in- 
struction after the one that caused the fault. 


Opcodes that were not explicitly defined on the 8086 are interpreted as new in- 
structions or cause the undefined opcode fault (INT 6) when executed on the 80386 
or 80486. 


When the PUSH SP instruction is executed, the value on the stack of the 80386 or 
80486 is the predecremented value, where the value pushed on the 8086 is the post- 
decremented value of SP. If it is necessary to re-create the same stack value, use the 
following sequence of instructions in place of PUSH SP: 


PUSH BP 
MOV BP, SP 
XCHG BP, [BP] 
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The count value for shift and rotate instructions is taken modulo 32 in the 80386 and 
80486. The full value (up to 255) is used on the 8086, which can result in long in- 
struction execution times. 


An instruction (including prefixes) cannot exceed 15 bytes. If it does, a general pro- 
tection fault occurs. This does not occur under normal circumstances but might oc- 
cur if you use multiple redundant prefixes. The 8086 has no such restrictions. 


Operands cannot extend across the segment bounds. If, for example, an instruction 
refers to a 16-bit operand at offset 65535, a general protection fault occurs. If the 
stack pointer is set to low memory (offset 2) and a 32-bit value is pushed, a stack 
fault occurs. In the 8086, addresses wrap around the segment boundary and are 
continuous from 65535 to 0. Instruction execution behaves like an operand fetch. 


You can use the LOCK instruction only with certain instructions; otherwise, an un- 
defined opcode fault occurs. (See Chapter 8 for a list of the legal instructions.) The 
8086 has no such restrictions. 


Sometimes the 8086 hangs while single-stepping. Later processors do not hang be- 
cause the interrupt priorities are slightly different. This prevents a single-step trap 
from occurring until the handler returns if a hardware interrupt is invoked: 


The 8086 generates a divide fault if the quotient of an IDIV instruction is the largest 
possible negative number. The 80386 and 80486 generate the correct result. See the 
earlier discussion of the divide fault in this appendix. 


When the content of the FLAGS register is pushed onto the stack, bits 12-15 are al- 
ways 1’s on the 8086. These bits represent new flags on later processors. 


The NMI interrupt masks all subsequent NMIs until an IRET is executed. NMIs are 
not masked on the 8086. 


The 80386 uses INT 16 as the coprocessor error vector. On the 8086, the system 
hardware must be programmed to generate an interrupt vector, and it can be any 
vector. On the 80486, you can select either mode of operation. 


When an NDP exception occurs on an 80386 or an 80486, the saved CS:EIP points 
to the faulting instruction, including any prefixes that might be part of the instruc- 
tion. On the 8086, the saved CS:IP points only to the ESC portion of the faulting 
NDP instruction. 


Additional interrupts can occur if a program contains undetected bugs, such as the 
use of unimplemented opcodes or addressing beyond segment boundaries. 


The 8086 is limited to 1 MB of address space by having 20 physical hardware ad- 
dress lines. Using selectors such as FFFFH can result in linear addresses beyond 1 
MB, but because there are only 20 address lines, the addresses wrap around to 0. 
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Because there are 32 address lines on the 80386 and 80486, addresses greater than 
1 MB can be generated in real mode (up to 1OFFEFH). If system software depends 
on the ability to wrap around to 0 after 1 MB, hardware must be added to an 80386 
system to force address line 21 to 0 in real mode. The 80486 has this hardware on 
the processor chip. 


Differences Between Virtual 8086 Mode and 
the 8086 


All previously listed differences also apply to V86 mode in comparison to real mode 
on the 8086. Following are some additional differences. 


I/O instructions in V86 mode are allowed only if the I/O permission bitmap for the 
V86 mode task is set up. 


All exceptions (hardware and software interrupts) vector to the protected-mode 
IDT entries rather than through the real-mode interrupt mechanism. The protected- 
mode handlers must simulate the real-mode vector process when appropriate. 


Differences Between the 80286 and the 
80386/80486 


As implemented on the 80286, the LOCK prefix causes all memory to be locked dur- 
ing the prefixed instruction. On the 80386 and 80486, only the memory accessed by 
the prefixed instruction is locked. 


On RESET, any of the registers that contained undefined values on the 80286 can 
contain different values on later processors. 


Differences Between the 80386 and the 80486 


The 80486 will optionally generate an alignment fault on any memory reference in- 
struction of more than a single byte. 


New bits have been defined in the control registers, page table entries, and flags. 


Differences Between the 8087 and the 
8038 7/80486-NDP 


Errors are signaled via a dedicated hardware pin on the 80387/80486-NDP instead 
of the standard CPU interrupt mechanism. The 80386 responds to coprocessor er- 

rors via interrupts 7, 9, and 16 instead of an external hardware interrupt. The 80486 
generates interrupts 7 and 16—but not interrupt 9. 


The format of the error information in the 80387/80486-NDP environment varies 
depending on whether the processor is in real mode or in protected mode. The 
8087 supports only real-mode information. 
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The instructions FENI/FDISI are no-ops on the 80387/80486-NDP. 


The 8087 does not perform automatic normalization of denormalized reals. Instead, 
it signals a denormal exception and relies on the application to perform this opera- 
tion. The 80387/80486-NDP will normalize these values and might execute faster if 
the denormal exception is masked when running 8087 programs. 


The 8087 requires explicit WAIT instructions before each floating-point instruction 
to synchronize with the 8086. The 80386 and the 80387 perform automatic synchro- 
nization, as does the 80486 and its NDP. The WAIT instructions are unnecessary, 
but they will not cause the program to operate incorrectly. 


Differences Between the 80287 and the 
8038 7/80486-NDP | 


The FSETPM instruction is a no-op on the 80387. 


The 80287 supports both affine and projective closure. Only affine closure is sup- 
ported on the 80387/80486-NDP. Programs that rely on projective closure may gen- 
erate different results on the 80387/80486-NDP than on the 80287. 


Differences Between the 80387 and the 80486 


Interrupt 9 will not be generated on the 80486. Interrupt 13 will be generated 
instead. 


The 80486 supports redirected error reporting of floating-point errors via the NE bit 
in CRO and the FERR\ and IGNNE\ hardware pins. 
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Index 


A 


AAA (ASCII Adjust After Addition) 
163 
AAD (ASCII Adjust Before Division) 
164 
AAM (ASCII Adjust After 
Multiplication) 165 
AAS (ASCII Adjust After Subtraction) 
166 
abort (exception class) 
coprocessor segment overrun 
(INT 9) 124 
defined 119-20 
double fault (NT 8) 123-24 
accessed (A) bit 56 
access rights 56, 93, 152 
accumulator 2, 71, 79, 89 
ADC (Add with Carry) 167 
ADD (Integer Addition) 168 
addresses 
effective 75, 89 
linear 18, 29, 48-50, 129, 138-40 
physical 18, 49, 135-37, 139-40, 143, 
151 
segment/offset 51 
virtual 49—52, 110, 135-37 
addressing modes. See instruction 
operands 
address translation 
virtual to linear 51-52 
virtual to physical 135-36 
affine closure 42 
AH register 24 
alias segments 111, 130 
alignment check (AC) bit 25—26, 127 
alignment check fault (NT 17) 119, 
127 
alignment fault (NT 17H) 25 
alignment mask (AM) bit 30, 127 
AL register 24 . 
AND (Boolean AND) 169 


architecture 
80486 microarchitecture 19—20 
CPU microarchitecture 16-19 
evolution of 80x86 family 1-12 
instruction set 20—23 
arithmetic instructions 
floating-point 96—97 
integer 78—80 
arithmetic shifts 82—83 
ARPL (Adjust RPL) 170-71 
array indexing. See scaling 
ASCII 
character set 403 
instructions 80 
numeric format 22—23 
assembler notation conventions xiii 
associative memories 147 
auxiliary carry flag (AF) 28 
available (AVL) bit 56 
AX register 24 


back link. See link field 
base address 
of the GDT 29 
of the IDT 28 
of a segment 51-53, 57, 106, 150 
based addressing 
alone 73 
plus displacement 73 
plus displacement plus index 
75-76 | 
base pointer (EBP) register 4, 73 
base registers 73-76 
BCD instructions 
floating-point 95 
integer 80 
BCD numeric format 22—23, 33, 
37-38 
BH register 24 
bias, floating-point exponent 34-36 
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big (B) bit 56 

binary fractions 33-36, 41 

BIST (built-in self test) (80486) 151 
bit instructions 80-82 

bit strings 20, 81-82 


bit test. See BT (Bit Test); BTC/80486 


(Bit Test and Complement); 
BTR (Bit Test and Reset); BTS 
(Bit Test and Set) 

BL register 24 

Boolean instructions 81—82 

BOUND (Check Array Boundaries) 
172-73 

bounds check fault (NT 5) 119, 122 

BP register 4, 24 

branch instructions 86-88 

breakpoint registers 129-33 

breakpoint trap 122 

BSF (Bit Scan Forward) 174-75 

BSR (Bit Scan Reverse) 176-77 

BSWAP (Byte Swap) 84, 178 

BT (Bit Test) 179-80 

BTC/80486 (Bit Test and 
Complement) 181-82 

BTR (Bit Test and Reset) 183-84 

BTS (Bit Test and Set) 185-86 

built-in self test. See BIST (built-in 
self test) (80486) 

Bus Interface Unit CBIU) 16 

bus lock (LOCK \) 84, 92, 140 

busy (B) bit 40, 114 

busy TSS 112, 116 

bytes 21 

BX register 24 


Cc 


cache 
descriptor 18, 28, 39 
internal (80486) 19-20, 145—48 
page table 18, 140-41 
cache disable (CD) bit 30 
CALL (Procedure Calls) 187-89 
call gate 104-5, 112, 129 
carry flag (CF) 28 
CBW (Convert Byte to Word) 190 
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CDQ (Convert Doubleword to 
Quadword) 191 
CH register 24 
CLC (Clear Carry Flag) 192 
CLD (Clear Direction Flag) 193 
CLI (Clear Interrupt Flag) 194 
clock signal 15 
CL register 24 
CLTS (Clear Task Switched Bit) 195 
CMC (Complement Carry Flag) 196 
CMP (Compare Integers) 197—98 
CMPS (Compare String) 199-200 
CMPXCHG (Compare and Exchange) 
201-2 
code segments 56 
concurrency. See multitasking 
support 
condition codes. See also NDP, 
register set 
EFLAGS register 25—28, 84, 86 
Jcc (Jump if Condition) 87-88 
SET cc (Set Byte on Condition) 
84-85 
conforming segments 56 
control instructions 97—98 
control registers (CRO—CR3) 30-32, 
84, 115, 126-27, 139, 150 
control transfer instructions 86-88 
control word (CW) register 42—43 
coprocessor 
emulation of 31 
environment 97 
instructions 94—99 
introduction of 6-7 
monitor 31 
numeric formats 33—38 
registers 38 
coprocessor error fault (INT 16) 127 
coprocessor not available fault 
(INT 7) 123 


coprocessor segment overrun (INT 9) 


124 
CPU microarchitecture 16-19 


_CS segment register 28, 49, 56, 102, 


150-51 


current privilege level (CPL) 26, 55, 
102-3, 141 

CWD (Convert Word to Doubleword) 
203 

CWDE (Convert Word to Doubleword 
Extended) 204 

CX register 4, 24 


DAA (Decimal Adjust AL After 
Addition) 205 
DAS (Decimal Adjust AL After 
Subtraction) 206 
data segments 56 
data transfer instructions 84-85 
data types 
ASCIT 22—23 
BCD 22-23, 37-38 
bit strings 20 
bytes 21 
conventions xti—Xxiti 
doublewords (dwords) 22 
integers 21 
long reals (double-precision) 
33-97 
quadwords (qwords) 22 
short reals (single-precision) 33-37 
temp reals (extended-precision) 
33, 34, 35 
words 21 
debug breakpoints 122, 128-33 
debug exception (INT 1) 117, 122 
debug registers 32, 130-33 
DEC (Decrement) 207 
decimal instructions 80 
default (CD) bit 56, 152 
denormal exception (DE) bit 42 
denormal floating-point numbers 37 
denormal operand mask (DM) bit 43 
descriptor cache 18, 28, 39. See also 
shadow registers | 
descriptor formats 51-59, 111-13 
descriptor privilege level CDPL) 102—3 
descriptor tables 51, 58, 109-11 
descriptor type (TYPE) field 56, 112 


DH register 24 
differences, 8086-family processor 
463-066 

direct addressing 71-72 
direction flag (DF) 69, 88 
directory, page table 126, 138-41 
DI register 24 
dirty (D) bit 138 
disable interrupts (CLD 27, 157-58 
displacement 68, 72, 73, 75—76 
DIV (Unsigned Division) 208—9 
divide fault 

80386-family (INT 0) 122 

80387 (ZE exception) 42 
DL register 24 
double fault (INT 8) 119, 123-24 
double-precision format Cong real) 

33-37 

double shift 83 
doublewords (dwords) 22 
DS segment register 5, 49, 88-89 
dword count field 105 
dwords 22 
DX register 24 


E 
EAX register 23, 24, 151 
EBP register 23, 24, 73, 75 


_ EBX register 23, 24 


ECX register 23, 24, 88, 89, 91 

EDI register 23, 24, 88 

EDX register 23, 24 

EFLAGS register 25—28, 79, 84, 86 

EIP register 77 

emulate math coprocessor (EM) bit 
31, 123 

enable interrupts (STD 27, 157-58 

ENTER (Enter New Stack Frame) 
210-11 

equal (branch condition) 86 

error codes 121—27 

ERROR\ pin 127 

error pointer registers 44—45, 97 

error summary (ES) bit 41 

ESI register 24, 88 
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ESP register 24, 73, 77 
ES register 5, 49, 88 
exception masks 41, 43 
exceptions. See also floating-point 
exceptions 
80386/80486 
aborts 119-20 
protected-mode handling 118-28 
real-mode handling 152-53 
traps 118-19 
virtual 8086-mode (V86-mode) 
handling 159-60 
NDP 
conditions 41—43 
mask bits 41, 43 
execute-only segments 56 
execution unit 18 
expand-down segments 56-57 
exponent, floating-point 33-36 
extended-precision floating point 
(temp real) 33, 36, 41, 43 
extension type (ET) bit 31 


F 


FABS (Absolute Value) 330 

FADD (Addition) 331-32 

FAR CALLs and JMPs 115, 125, 129, 
154-55 

faults. See exceptions 

FBLD (BCD Load) 333 

FBSTP (BCD Store and Pop) 334 

FCHS (Change Sign) 335 

FCLEX (Clear Exceptions) 336 

FCOM (Compare) 337-38 

FCOS (Cosine) 339 

FDECSTP (Decrement Stack Pointer) 
340 

FDIV (Division) 341-42 

FDIVR (Division Reversed) 343—44 

FFREE (Free NDP Register) 345 

FIADD (integer Addition) 346 

FICOM (integer Compare) 347 

FIDIV Cnteger Division) 348 

FIDIVR (Integer Division Reversed) 
349 
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FILD (Integer Load) 350 

FIMUL (Integer Multiplication) 351 

FINCSTP (Increment Stack Pointer) 
352 

FINIT (Initialize NDP) 353 

FIST (Integer Store) 354 

FISUB Cinteger Subtraction) 355 

FISUBR (Integer Subtraction 
Reversed) 356 

flag register (EFLAGS) 25-28, 86 

FLD (Load Real) 357 

FLD const (Load Constant) 358 

FLDCW (Load Control Word) 359 

FLDENV (Load Environment) 360 

floating-point condition codes 41—43 

floating-point environment 97 

floating-point exceptions 41—43, 123, 
127 

floating-point formats 33-36 

floating-point instruction set 329 

floating-point support 32—33 

FLUSH\ 147-48 

FMUL (Multiplication) 361 

FNOP (No Operation) 363 

FPATAN (Partial Arctangent) 364 

FPREM (Partial Remainder) 365—66 

FPREMI1 (IEEE Partial Remainder) 
367-68 

FPTAN (Partial Tangent) 369 

fraction, binary 33—36, 41 

fragmentation 63, 137 

frame pointer. See stack frame 

FRNDINT (Round to Integer) 370 

FRSTOR (Restore NDP State) 371-72 

FSAVE (Save NDP State) 373-74 

FSCALE (Scale by 2”) 375 

FSETPM (Set Protected Mode) 376 

FSIN (Sine) 377 

FSINCOS (Sine and Cosine) 378 

FSQRT (Square Root) 379 

FS segment register 28, 49, 91 

FST (Store Floating Point) 380 

FSTCW (Store Control Word) 381 

FSTENV (Store Environment) 382 

FSTSW (Store Status Word) 383 


FSUB (Subtraction) 384-85 

FSUBR (Subtraction Reversed) 
386-87 

FTST (Test for Zero) 388 

FUCOM (Unordered Compare) 
390-91 

FWAIT (Wait Until Not Busy) 33, 392 

FXAM (Examine Top of Stack) 
393-94 

FXCH (Exchange Stack Elements) 395 

FXTRACT (Extract Floating-Point 
Components) 396 

FYL2X (Compute Y x log,X) 397 

FYL2XP1 (Compute Y x log, CX + 1)) 
398 

F2XM1 (Compute 2%” — 1) 399 


G 

gates 102, 104-6, 112-13, 120-21, 125 

GDTR register 29, 58, 107, 155 

general protection fault INT 13) 52, 
90, 119, 126, 157 

Global Descriptor Table (GDT) 29, 
58, 93, 106-11, 121 

global enable (GO—G3) bits 132 

granularity (G) bit 54, 142, 155 

greater than (branch condition) 87 

GS segment register 28, 49, 91 


Hi 
HLT (Halt) 93, 212 


i 

IDIV Cinteger (Signed) Division) 
213-14 

IDTR register 29, 58, 106, 150, 152, 154 

JEEE-754 floating-point format 7, 31 

immediate operands 70 

implicit operands 69 

IMUL (integer (Signed) 
Multiplication) 215 

IN (Input from I/O Port) 216-17 

INC (Increment) 218 

index addressing 

with base plus displacement 75—76 


index 


index addressing, continued 
plus displacement 73-75 
index field 107, 109 
infinity 36, 42 
infinity control bit 42 
initial processor state 149-50 
input 
instruction 70-71, 90-91 
protection checks 117, 157 
INS Cnput String from I/O Port) 85, 
219-20 
instruction categories 
arithmetic 78-80 
control transfer 86-88 
data transfer 84-85 
decimal arithmetic 80 
I/O 90-91 
logical 80—83 
miscellaneous 94 
pointer manipulation 89—90 
prefix 91-93 
stack 85-86 
string 88-89 
system 93 
instruction decode unit 18 
instruction disassembly table 455-61 
instruction formats and timings 15, 
68, 417-53 
instruction operands 
immediate 70 
implicit 69 
I/O 70-71 
memory reference (see memory 
reference operands) 
register 69 
instruction prefetch queue 17 
instruction prefetch unit 17 
INT (Software Interrupt) 221-22 
integer data format 
80386 21 
NDP 33 
Interrupt Descriptor Table (DT) 
118-20, 142, 154-55, 159 
interrupt enable flag (IF) 27 
interrupt gates 104, 112-13, 120-21 
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interrupts 
disabling/enabling 27, 126, 157-58 
exceptions, faults, and traps 118-30 
gates for 104, 112—13, 120-21 
hardware 27, 118, 119 
masking 128 
in real mode 152-53 
software 27, 86, 119, 129, 159 
in virtual 8086 mode 159-60 
INTO Cnterrupt on Overflow) 223 
invalid opcode fault (NT 6) 122—23 
invalid operation exception CIE) bit 
42 
invalid TSS fault (NT 10) 124-25 
INVD Cnvalidate Cache) 93, 224 
INVLPG (CInvalidate TLB Entry) 93, 
225 
l/O 
instructions 70-71, 90—91 
operands 70-71 
permission bitmap 117-18, 157 
permission checks 117, 157 
in protected mode 117 
in virtual 8086 mode 157 
I/O privilege level GOPL) 26, 90, 117, 
157-59 
IRET (interrupt Return) 226 


J 
Jcc (Jump if Condition) 227-28 
JMP (Near, Far Jump) 229-30 


K 
KEN\ 148 


L 


LAHF (Load AH with Flags) 231 

LAR (Load Access Rights) 232-33 

LDS (Load DS) 246 

LDTR register 29, 115, 116, 125 

LEA (Load Effective Address) 234 

LEAVE (Leave Current Stack Frame) 
235 

LES (Load ES) 246 

less than (branch condition) 87 | 
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LFS (Load FS) 246 

LGDT (Load GDT Register) 236 

LGS (Load GS) 246 

LIDT (Load IDT Register) 237 

limit 54-55, 150-51 

linear addresses 18, 29, 48—50, 129, 
138-40 

linear memory, vs. segmented 
memory 47—49 

link field 115, 116 

LLDT (Load LDT Register) 238 

LMSW (Load Machine Status Word) 
239 

local descriptor table (LDT) 106-7, 
109-12 

local enable (LO-L3) bits 132 

LOCK (Assert Hardware LOCK\ 
Signal Prefix) 240-41 

LODS (Load String) 242-43 

logical shifts 82-83 

long real format (double-precision) 
33-37 

LOOP cc (Loop Decrement ECX and 
Branch) 244-45 

Lseg (Load Segment Register) 246 

LSL (Load Segment Limit) 247-48 

LSS (Load SS) 246 

LTR (Load Task Register) 249 


machine status word (MSW) 30, 155 
mask bits 41 
memory read/write breakpoints 
128-33 
memory reference operands 
based 73 
based plus displacement 73 
based plus index plus displacement 
75-76 
direct 71—72 
index plus displacement 73-75 
stack 76-78 
memory segments 4—6, 47-57 
microarchitecture 16—20 
microcycle 67—68 


modes 
protected 7-8, 10, 31, 101-33, 
154-55 
real 7, 149-53, 156 
transitions between 31, 156 
virtual 8086 10, 142—43, 156-60 
monitor coprocessor (MP) bit 31, 123 
MOV (Move Data/Selector/Special) 
250-52 
MOVS (Move String) 253-54 
MOVSX (Move with Sign Extension) 
255 : 
MOVZX (Move with Zero Extension) 
256 
MUL (Unsigned Multiplication) 
257-58 
multiprocessing 113 
multitasking support 31, 48, 59, 106, 
113-17, 142-45 


NaN (Not a Number) 37 
native mode 92, 149, 152 
NDP 
data formats supported 33—38 
defined 32 
register set 38-45. 
NEG (Negate Integer) 259 
negative number formats 
floating-point 34 
integer 21 
nested task (NT) flag 26, 116 
Non-Maskable Interrupt (NMI) 27, 
118, 128, 158 
NOP (No Operation) 260 
NOT (Boolean Complement) 261 
not present fault (INT 11) 125 
no write-through (NW) bit 30 
null selector 109, 159 
numeric data processor. See NDP 
numeric formats 
BCD 22-23, 33, 37-38 
floating-point 33-36 
integer 21 
numerics exception (NE) bit 30, 127 


index 


Oo 


offset 51 
opcodes, tables of 405-15 
OR (Boolean OR) 262 
OUT (Output to Port) 263-64 
output 
instruction 70-71, 90-91 
privilege checking 117, 157 
OUTS (Output String) 265-66 
overflow exception (OE) bit 42 
overflow flag (OF) 26-27, 85, 88, 118, 
122 
overflow trap (INT 4) 119, 122 
override prefixes 
address 91, 152 
operand size 92, 152-53 
segment 49, 71, 91-92, 125-26 


Pp 


page cache disable (PCD) bit 138 
Page Directory Entry (PDE) 138-40 
page enable (PG) bit 30, 137, 156 
page fault (INT 14) 126-27, 138, 141 
page frames 125 
page granularity (G) bit 54 
Page Table Entry (PTE) 138-42 
page write-through (PWT) bit 138 
paging 135-42, 157 
paging unit 18 
parallelism 16, 33 
parity flag (PF) 28, 85, 88 
permission checks 
1/O 117, 157 
between privilege rings 104-9 
segment access 101—4 
physical addresses 18, 49, 135-37, 
139-40, 143, 151 
pipelining 14, 16 
pointer registers 3, 57, 71-78 
pointers 65, 89-90 
POP (Pop Segment Register/Value Off 
Stack) 267-68 
POPA (Pop All General Registers 
16-bit) 269 | | 
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POPAD (Pop All General Registers 
32-bit) 270 
POPF (Pop Stack into FLAGS) 271 
POPFD (Pop Stack into EFLAGS) 272 
powers of two 401 
precision, floating-point 33 
precision control (PC) field 43 
precision exception (PE) bit 41, 155 
prefix instructions 
ADRSIZ 92, 152 
LOCK 92, 140, 155 
OPSIZ 92, 152 
repeat 89, 91 
segment override 49, 71, 91-92, 
125-26 
present (P) bit 
in descriptor 55, 125 
in page table 126-27, 138 
privilege levels 
current (CPL) 26, 55, 102—3, 141 
descriptor (DPL) 102 
paging and privilege 141-42 
rings 102—4 | 
transitions between 104-11 
processor differences. See 
differences, 8086-family 
processor 
projective closure 42 
protected mode 
introduction to 7-8 
mechanism (80386/80486) 10, 
101—33 
switching into/away from 154-56 
protect enable (PE) bit 31, 156 
protection 
of pages 126-27, 141-42 
of segments 101-4 
PUSH (Push Value onto Stack) 273-74 
PUSHA (Push 16-Bit General 
Registers) 275 
PUSHAD (Push 32-Bit General 
Registers) 276 
PUSHF (Push 16-Bit EFLAGS Register) 
277 
PUSHFD (Push EFLAGS Register) 278 
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Q 


quadwords 22 
quiet NaN 37 


RAM, intelligent 146-47 
RCL (Rotate Through Carry Left) 
279-80 
RCR (Rotate Through Carry Right) 
281-82 
readable code segments 56 
read-only data segments 56, 62 
read/write (R/W) bit 
for debugging 131 
for paging 141 
READY\ 94 
real mode 7, 149-53 
real-mode differences, 8086-family 
processor. See differences, 
8086-family processor 
real number formats 33—36 
register operands 69 
registers | 
breakpoint 129-33 
control 30—32, 84, 115, 126-27, 139, 
150 
debug and test 32, 130-33 
general 23-25 
NDP 38-45 
protection 29 
segmentation (see segment 
registers) 
REP (Repeat String Prefix) 283 
requested privilege level (RPL) 58, 59, 
107-9, 116, 125 
RESET 149-51 
resume flag (RF) 26, 128 
RET (Near Return from Subroutine) 
284 
RETF (Far Return from Subroutine) 
285 
return 
from interrupt 120, 160 
from subroutine 77 
from task switch 116, 120 


rings, protection 8, 102—4 

ROL (Rotate Left) 286-87 

ROR (Rotate Right) 288-89 

rotate instructions 83 

rounding control (RC) field 42—43 


SAHF (Store AH in EFLAGS) 290 
SAL (Shift Left Arithmetic) 291-92 
SAR (Shift Right Arithmetic) 293-94 
SBB (Subtraction with Borrow) 295 
scaling 74, 93 
SCAS (Scan String) 296-97 
seg (Segment Override Prefix) 298. 
See also segment override prefix 
segmentation 
address translation in 47-52 
combining paging and 142 
introduction of 4—6 
protection in 52-57 
segmentation unit 18 
segment override prefix 49, 71, 91-92, 
125-26 
segment registers 
description of 28-29 
initialization of 149-50 
introduction of 4—5 
loading and storing 84, 109, 115, 125, 
160 
in virtual addressing 50, 64 
selector 51, 58—59, 101—2, 107-10, 115 
self test. See BIST (built-in self test) 
(80486) 
SET cc (Set Byte on Condition) 
299—300 
SGDT (Store GDT Register) 301 
shadow registers 116, 150-51. See also 
descriptor cache 
shared segments 62-63, 142 
SHL (Shift Left Logical) 302—3 
SHLD (Shift Left Double) 304 
short real (single-precision) format 
33-36 | 
SHR (Shift Right Logical) 305—6 
SHRD (Shift Right Double) 307 


index 


shutdown 123-24, 158 

SIDT (Store IDT Register) 308 

signaling NaN 37 

sign flag (SF) 27, 85, 87, 88 

significand 33-35 

single-precision (short real) format 
33-36 

single stepping 129, 153 

SI register 24 

SLDT (Store LDT Register) 309 

SMSW (Store Machine Status Word) 
310 

software breakpoints 122 

software interrupts 27, 86, 119, 129, 
159 

SP register 4, 24 

SS segment register 28—29, 49, 76, 
106, 159 

stack-based addressing 76-78 

stack fault (NT 12) 57, 77, 125-26 

stack fault (SF) bit 41 

stack frame 9, 73, 77 

stack overflow 76 

stack pointer (ESP) register 23-24, 73 

status word (SW) register 38, 40-43 

STC (Set Carry Flag) 311 

STD (Set Direction Flag) 312 

STI (Set Interrupt Flag) 313 

STOS (Store String) 314-15 

STR (Store Task Register) 316 

string instructions 27, 88—89, 91 

SUB (Subtraction) 317 

subroutine call 4, 77, 86 

supervisor pages 141—42 

swapping 

pages 136-39, 144-45 
segments 56, 60—62, 125 
syntax conventions xiv 


T 


table indicator (TI) bit 58, 107, 109, 
121 

tag word (TW) register 38, 44 

task gate 104, 113, 120, 125 

task (TR) register 29, 106, 115 
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Task State Segment (TSS) 29, 106-7, W 


111-18, 124, 125, 129 WAIT (Wait Until Not Busy) 323 


task switched (TS) bit 31, 123 WBINVD CWrite Back and Invalidate 
task switching 115-17 Cache) 93, 324 


task switch trap (T) bit 129, 132 


whetstone 32 
temp real (extended-precision) 


word count field. See dword count 


format 33, 36, 41, 43 field 
TEST (Test Bits) 318 words 21 
test registers 52,107 writable data segments 56 
thrashing 62 write protect (WP) bit 30, 141 
top-of-stack (TOP) field 40-41 
translation lookaside buffer (TLB) 18, X 


116, 140—41, 156 
trap flag CTF) 27, 129, 153 


trap gates 104, 112, 120, 125 
traps 118-19, 122, 129 XLATB (Translate Byte) 327 


XOR (Boolean Exclusive OR) 328 
two-phase clock 15 
type (TYPE) field 56, 112-13,114 2 


XADD (Exchange and Add) 325 
XCHG (Exchange) 326 


U zero divide exception (ZE) bit 42 
zero divide fault UNT 0) 122 


undefined opcode fault (NT 6) zero flag (ZF) 27, 85, 87, 88 


122-23 
underflow exception (UE) bit 40, 41 
unmasked exceptions 41 
unsigned comparisons 85, 87 
user level pages 141-42 
user/supervisor (U/S) bit 127, 141 


V 


V86 mode. See virtual 8086 mode 

vector table 119, 152-53 

VERR (Verify Read Access) 319-20 

VERW (Verify Write Access) 321-22 

virtual 8086 mode 10, 26, 156—60 

virtual addresses 49—52, 110, 135—37 

virtual memory 49-52, 55, 59-64, 
135-37 

virtual mode (VM) bit 26, 157, 159, 160 
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